Split Miner in Python

An implementation of Split Miner v.1.0 and 2.0 in Python derived from "Split miner: automated discovery of accurate and simple business process models from event logs" and "Automated Discovery of Process Models with True Concurrency and Inclusive Choices"

The project covers the full pipeline end-to-end:

Directly-follows graph + self/short-loop detection
Concurrency discovery and pruned DFG
Source-to-sink filtering
Initial BPMN construction
Splits discovery
Joins discovery
OR-join minimization
SM 2.0 only: lifecycle-aware DFG, refined concurrency, improper completion fix, OR-split heuristic.

Requirements

Python 3.10+
The packages listed in requirements.txt:
Graphviz (only for PNG rendering through pm4py.save_vis_bpmn). Install it on Windows from https://graphviz.org/download/ and make sure dot.exe is on PATH.

Install packages in requirements.txt with:

pip install -r requirements.txt

Reproducing the paper's running example

python debug.py

Reads the 10-trace running example from the first version paper, runs the pipeline phase-by-phase, prints intermediate sizes, and saves `paper_example.bpmn` and `paper_example.png`.

Quick start

1. Discover a BPMN from an XES log (Split Miner 1.x)

python main.py path/to/log.xes.gz --out model.bpmn --png model.png

This produces model.bpmn (BPMN 2.0 XML, openable in any BPMN tool such as bpmn.io) and a PNG rendering at model.png.

2. Discover with Split Miner 2.0

If your log has lifecycle:transition attributes that distinguish start and complete/end events, SM 2.0 can detect true concurrency (truly overlapping activity lifecycles) and inclusive (OR) splits:

python main.py path/to/log.xes.gz --v2 --out model_v2.bpmn --png model_v2.png

On logs without lifecycle info the algorithm degrades gracefully to SM 1.x behaviour and prints a warning.

3. CLI options

Flag	Meaning	Default
`log`	Path to a `.xes` / `.xes.gz` file	required
`--eps FLOAT`	Concurrency threshold ε ∈ [0, 1]	`0.1`
`--eta FLOAT`	Filtering percentile η ∈ [0, 1]	`0.4`
`--no-or-min`	Skip the OR-join minimisation phase	off
`--v2`	Use Split Miner 2.0 algorithm	off
`--out PATH`	Output BPMN XML path	`model.bpmn`
`--png PATH`	Optional PNG rendering path	none

ε controls when two activities are considered concurrent: lower ε is more permissive (more || relations), higher ε is stricter. η controls the filter — lower η retains more edges (higher fitness), higher η keeps the model simple (higher precision).

Programmatic usage

The CLI is a thin wrapper around two functions in main.py. You can call them directly:

from log_loader import load_traces, load_traces_v2
from main import run, run_v2
from bpmn_export import to_bpmn, save_bpmn_xml, save_bpmn_png

# ---- Split Miner 1.x ----------------------------------------------------
traces = load_traces("path/to/log.xes.gz")  # list[list[str]]
wg = run(traces, eps=0.1, eta=0.4, or_minimise=True)

# ---- Split Miner 2.0 ----------------------------------------------------
refined = load_traces_v2("path/to/log.xes.gz")  # list[list[(label, lc, ts)]]
wg = run_v2(refined, eps=0.1, eta=0.4, or_minimise=True)

# ---- Export -------------------------------------------------------------
bpmn = to_bpmn(wg)  # pm4py BPMN object
save_bpmn_xml(bpmn, "model.bpmn")  # writes BPMN 2.0 XML
save_bpmn_png(bpmn, "model.png")  # writes PNG (needs Graphviz)

wg is a WorkingGraph (see types_.py) — a mutable adjacency representation of the BPMN. You can inspect / post-process it before exporting.

Project layout

.
├── main.py              CLI + run() / run_v2() entry points
├── debug.py             Phase-by-phase runner on the paper's example
├── log_loader.py        XES -> trace list (with optional lifecycle info)
├── types_.py            DFG, Node, WorkingGraph data classes
├── dfg_builder.py       Phase 1 — DFG, self/short-loops, refined DFG
├── concurrency.py       Phase 2 — pruned DFG, SM 2.0 Eq.5 oracle
├── filtering.py         Phase 3 — Algorithm 1 (max-min-frequency BFS)
├── bpmn_init.py         Phase 4 — initial BPMN from filtered PDFG
├── splits.py            Phase 5 — Algorithms 5, 6, 7 (XOR / AND splits)
├── rpst.py              Lightweight RPST decomposition for joins
├── joins.py             Phase 6 — split-aware join insertion (Alg. 8)
├── or_minimize.py       Phase 7 — Algorithm 9 (OR-min)
├── heuristics.py        SM 2.0 — improper-completion + OR-split heuristics
├── bpmn_export.py       WorkingGraph -> pm4py BPMN object / XML / PNG

Algorithmic notes & caveats

Concurrency threshold for SM 1.x. The original paper writes the imbalance condition as < ε, but its own running example only matches Figure 3c with ≤ ε (the pair e||g has imbalance exactly 0.2 at the paper's reported ε = 0.2). The Java reference implementation also uses ≤. The Python port follows the Java behaviour to keep the running example reproducible.
Joins discovery. The implementation follows Algorithm 8 of the paper but groups incoming edges of every join target by their nearest split-gateway ancestor, then recurses. This produces several "smaller" joins instead of one large multi-input join when the SESE structure has multiple nested fragments. Both shapes are semantically equivalent; the Java reference tends to emit larger n-ary joins, so its nodes/edges counts are slightly lower than this port's.
SM 2.0 with complete-only logs. Many real logs (Sepsis, BPIC12, …) record only lifecycle:transition = complete. In that case the loader synthesises a start event with the same timestamp so the SM 2.0 pipeline stays well-formed, but it cannot detect any overlaps and the output is identical to SM 1.x. The Java reference behaves the same way.

References

Augusto, A., Conforti, R., Dumas, M. et al. Split miner: automated discovery of accurate and simple business process models from event logs. Knowl Inf Syst 59, 251–284 (2019). https://doi.org/10.1007/s10115-018-1214-x
Augusto, A., Dumas, M., La Rosa, M. (2021). Automated Discovery of Process Models with True Concurrency and Inclusive Choices. In: Leemans, S., Leopold, H. (eds) Process Mining Workshops. ICPM 2020. Lecture Notes in Business Information Processing, vol 406. Springer, Cham. https://doi.org/10.1007/978-3-030-72693-5_4

License

This work is licensed under GPL-3.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Split Miner in Python

Requirements

Reproducing the paper's running example

Reads the 10-trace running example from the first version paper, runs the pipeline phase-by-phase, prints intermediate sizes, and saves `paper_example.bpmn` and `paper_example.png`.

Quick start

1. Discover a BPMN from an XES log (Split Miner 1.x)

2. Discover with Split Miner 2.0

3. CLI options

Programmatic usage

Project layout

Algorithmic notes & caveats

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bpmn_export.py		bpmn_export.py
bpmn_init.py		bpmn_init.py
concurrency.py		concurrency.py
debug.py		debug.py
dfg_builder.py		dfg_builder.py
filtering.py		filtering.py
heuristics.py		heuristics.py
joins.py		joins.py
log_loader.py		log_loader.py
main.py		main.py
or_minimize.py		or_minimize.py
paper_example_log.py		paper_example_log.py
requirements.txt		requirements.txt
rpst.py		rpst.py
splits.py		splits.py
types_.py		types_.py

Folders and files

Latest commit

History

Repository files navigation

Split Miner in Python

Requirements

Reproducing the paper's running example

Reads the 10-trace running example from the first version paper, runs the pipeline phase-by-phase, prints intermediate sizes, and saves paper_example.bpmn and paper_example.png.

Quick start

1. Discover a BPMN from an XES log (Split Miner 1.x)

2. Discover with Split Miner 2.0

3. CLI options

Programmatic usage

Project layout

Algorithmic notes & caveats

References

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Reads the 10-trace running example from the first version paper, runs the pipeline phase-by-phase, prints intermediate sizes, and saves `paper_example.bpmn` and `paper_example.png`.

Packages