This repository contains a process mining analysis of a loan application process, based on the BPI Challenge 2012 event log. The primary goal is to dissect the process flow, identify inefficiencies, bottlenecks, and rework loops using Python and the pm4py library.
The complete findings and business recommendations are detailed in the accompanying PDF report. The Jupyter Notebook provides the full technical implementation of the analysis.
The bpi_2012_loan_analysis.ipynb notebook follows a systematic approach to derive insights from the raw event data:
- Data Loading and Exploration: The
financial_log.xesevent log is loaded, and initial statistics such as the number of cases, events, and variants are computed. - Data Cleaning and Preprocessing:
- Incomplete cases (those without a definitive end state) are filtered out to ensure a clean end-to-end analysis.
- Redundant activities (e.g.,
A_PARTLYSUBMITTED) that represent system noise are removed.
- Process Segmentation: The cleaned log is segmented into three distinct sub-logs based on the final case outcome:
Approved,Declined, andCancelled. - Comparative KPI Analysis: Key Performance Indicators (KPIs) are calculated for each segment, including:
- Average case duration
- Average requested loan amount
- Average number of manual work items (a proxy for effort and rework).
- Directly-Follows Graph (DFG) Analysis: The DFG is used to quantitatively identify the most frequent transitions and self-loops (rework) within each segment, revealing the core process inefficiencies.
.
├── data/
│ └── financial_log.xes.gz
├── bpi_2012_loan_analysis.ipynb
├── LICENSE
├── requirements.txt
└── README.md
- Python 3.8+
- pm4py: The primary library for process mining analysis.
- pandas: For data manipulation and KPI aggregation.
- Matplotlib & Seaborn: For data visualization.
- Jupyter Notebook: As the interactive development environment.
Follow these steps to set up the environment and run the analysis.
-
Clone the repository:
git clone <your-repository-url> cd <repository-folder>
-
Create and activate a virtual environment (recommended):
-
On macOS/Linux:
python3 -m venv venv source venv/bin/activate -
On Windows:
python -m venv venv .\venv\Scripts\activate
-
-
Install the required dependencies:
pip install -r requirements.txt
- Ensure your virtual environment is activated.
- Start the Jupyter Notebook server:
jupyter notebook
- In the browser window that opens, navigate to and open
bpi_2012_loan_analysis.ipynb. - You can run the cells sequentially to reproduce the entire analysis.
bpi_2012_loan_analysis.ipynb: The core of the project, containing all Python code for the process mining analysis.data/financial_log.xes: The raw event log data from the BPI Challenge 2012.report_pdf.pdf: A formal report summarizing the key findings, data visualizations, and business recommendations derived from the analysis.Scenario13.06.2025.pdf: The original case study description and assignment requirements.requirements.txt: A list of all Python packages required to run the notebook.