Dynamic Log Profiling package
Project description
The DyLoPro Python Library is a visual analytics tool that allows Process Mining (PM)[^1] practitioners to efficiently and comprehensively explore the dynamics in event logs over time, prior to applying PM techniques. These comprehensive exploration capabilities are provided by extensive set of plotting functionalities, visualizing the dynamics over time from different process perspectives.
[^1]: van der Aalst, W. (2016). Data Science in Action. In: Process Mining. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49851-4_1
https://en.wikipedia.org/wiki/Process_mining
DyLoPro library is ready-to-use and efficient software implementation of the identically named DyLoPro framework, introduced in the paper DyLoPro: Profiling the Dynamics of Event Logs, which will be presented at the BPM 2023 conference in Utrecht, and accordingly published in its main proceedings. A preprint of the (already peer-reviewed) BPM paper can be found here.
DyLoPro is a comprehensive visual analytics framework designed to explore event log dynamics over time. DyLoPro’s comprehensiveness is achieved through the incorporation of the main process perspectives - the control-flow, data (including resources) and performance, along two orthogonal dimensions of log concepts and representation types. It incorporates six log concepts to capture all essential information from event logs, including variants and directly-follows relations for the control-flow perspective, and categorical and numeric case and event features for the data perspective. These six log concepts can be represented using five representation types, including four performance-oriented ones (throughput time, number of events per case, outcome, and directly-follows-relations’ performance) and one generic type. With this two-dimensional approach, end users can gain a nuanced and holistic view of event log dynamics, efficiently identifying patterns, temporary or permanent changes, and trends of interest from multiple perspectives. Upon identification, they can further analyze these patterns and trends, ultimately leading to more appropriate application of downstream process mining techniques.
Documentation
You can consult the documentation of DyLoPro here.
The following terms are used interchangeably throughout the documentation:
- 'case' and 'trace'
- 'time period', 'time bucket' and 'time interval'
Installation
You can install DyLoPro using:
pip install DyLoPro
The DyLoPro PyPi page can be consulted here.
Requirements
DyLoPro depends on the following Python packages:
- numpy (version >=1.21.5)
- pandas (version >=2.0.2)
- matplotlib (version >=3.7.1)
- tqdm (version >=4.63.0)
If (some of) these requirements are not satisfied yet, then these packages will automatically be installed when installing DyLoPro.
Get Started
Assumptions & Terminology
For the moment, DyLoPro assumes flat event logs, and does not cater to object-centric event logs. DyLoPro also regards events to be the atomic unit of the event log. In other words, the execution of a single event is recorded as a single row in the resulting event log. For the terminology and definitions employed by the 'DyLoPro' Python package, please refer to Section 3 (Preliminaries) of the corresponding academic paper.
Step 1: Initializing a DynamicLogPlots
instance
Assuming that you have already loaded an event log into a pd.DataFrame called
e.g. event_log
.
After having imported the DyLoPro package, a DynamicLogPlots
instance has to
be initialized. The DynamicLogPlots
class provides one single source of
access to all of DyLoPro's visualization methods, and thereby serves as the
interface between the your python environment and DyLoPro’s underlying
computational logic.
import DyLoPro as dlp
plot_object = dlp.DynamicLogPlots(event_log,
case_id_key,
activity_key,
timestamp_key,
categorical_casefeatures,
numerical_casefeatures,
categorical_eventfeatures,
numerical_eventfeatures,
start_date,
end_date,
outcome)
After running this block of code, DyLoPro will verify the validity of the event log and all arguments specified.
-
If everything checks out, the event log is preprocessed into an internal format that allows DyLoPro to efficiently compute and visualize all aggregations on an on-demand basis. Afterwards, a
DynamicLogPlots
object is initialized, and all visualization methods can be accessed by invoking the corresponding methods on this instance. -
If an error is detected, DyLoPro will raise an error with a dedicated error message describing what went wrong and how it can be resolved.
The formatting requirements of the event log, and of all of the arguments
needed to initialize a DynamicLogPlots
instance (see code block above)
can be consulted here.
Step 2: Accessing all visualization methods
Assuming Step 1 is
successfully completed, you can now easily access all visualization methods
by simply invoking the appropriate methods on plot_object
.
As already mentioned, DyLoPro is the implementation of the identically named DyLoPro framework. Below, you will find a concise summary of the framework, followed by a comprehensive explanation of its implementation as a Python package. This section will guide you through leveraging all the available visualization capabilities. For a more detailed explanation on the framework, please refer to Section 4 of the paper.
Addtionally, the detailed notebooks containing the case studies conducted on a number of commonly used real-life event logs might also improve your understanding on how to use and access DyLoPro's variety of plotting methods.
Framework to package mapping
DyLoPro provides functionality to construct and visualize time series, i.e. the log dynamics, for a variety of log concepts. For each log concept, the dynamics can be represented using five different representation types.
- log concepts: the main dimensions along which we capture event log dynamics.
- representation types: how the event log dynamics should be represented and analyzed for each log concept.
Brief Summary Framework
The framework formalizes this procedure in three stages:
-
Log Discretization: Subdividing the event log into a chronologically ordered set of sub-logs. This is done in two consecutive steps:
- Given that the log encompasses data spanning a temporal interval denoted as T, split up T in a chronologically ordered set of equal-length time intervals.
- Create the ordered set of sub-logs by assigning each case to exactly one of these time intervals.
-
Domain Definition:
- Defining along which log concept to capture log dynamics, and how to represent these dynamics. This boils down to defining the log concept and representation type respectively.
- The resulting log concept - representation type combination translates into a unique domain-specific mapping function.
-
Time Series Construction & Visualization:
- Applying the resulting mapping function to each of the chronologically ordered sub-logs. Thereby creating (multiple) time serie(s).
- Visualizing the constructed time series.
Framework Implementation DyLoPro Package
As already mentioned, all visualization methods can be accessed by invoking
the appropriate methods on your initialized DynamicLogPlots
instance.
All of DyLoPro's plotting methods construct time series by deriving
real-valued measures for a choronologically ordered set of sublogs.
The three-stage framework is implemented in this package as follows:
-
Each of the plotting methods that can be invoked, pertains to exactly one log concept.
-
Given a certain log concept, and hence (one of its) associated plotting methods, the representation type can be passed on to that method by specifying the
plt_type
argument. The five representation types proposed in the paper correspond to the followig argument values for theplt_type
parameter.- Isolated :
plt_type='univariate'
- Throughput Time (TT) :
plt_type='type_tt'
- Case Length (NEPC) :
plt_type='type_events_case'
- Outcome :
plt_type='type_outcome'
- DFR Performance :
plt_type='type_dfr_performance'
- Isolated :
-
Log Discretization: Also the Log Discretization can be specified as as arguments of each plotting method.
frequency
parameter: Determine the frequency by which the cases are grouped together.case_assignment
parameter: Determines the condition upon which each case is assigned to a certain time interval.
E.g. if
frequency='weekly'
andcase_assignment='first_event'
, each case is assigned to the one-week time interval in which its first event occurs and hence each sublog will consist of all cases that were initialized in one particular week.Each method is also equipped with an additional set of optional configuration parameters, providing even more customization options to the user. For more information about the parameters corresponding to these methods, please consult the documentation.
The table underneath lists the plotting methods corresponding to each of the log concepts proposed in the DyLoPro framework. You can directly navigate to the detailled documentation of each methodd by clicking on it.
Log Concept | Method 1 | Method 2 | |
---|---|---|---|
1 | Variants | topK_variants_evol() |
variants_evol() |
2 | Directly-Follows Relations | topK_dfr_evol() |
dfr_evol() |
3 | Categorical Case Feature | topK_categorical_caseftr_evol() |
/ |
4 | Numerical Case Features | num_casefts_evol() |
/ |
5 | Categorical Event Feature | topK_categorical_eventftr_evol() |
/ |
6 | Numerical Event Features | num_eventfts_evol() |
/ |
Finally, it is also well-worth mentioning that the extensive capabilities proposed in the the DyLoPro framework are not meant to be exhaustive. The visualization methods offered by the DyLoPro package are consequently meant to be continuously extended and improved. Please find the visualization methods extending the framework listed below:
distinct_variants_evol()
: NOTE: Deprecated. Will be removed in future versions. Use thedistinct_variants_AdvancedEvol()
instead.distinct_variants_AdvancedEvol()
Citing DyLoPro
A Demo paper presenting the DyLoPro package is currently in the making.
The DyLoPro package is the software implementation of the identically named DyLoPro framework proposed in the paper DyLoPro: Profiling the Dynamics of Event Logs. This paper will be presented at the BPM 2023 conference in Utrecht, and accordingly published in its main proceedings.
In the meantime, if you are using DyLoPro in your scientiic work, please cite DyLoPro as follows:
B. Wuyts, H. Weytjens, S. vanden Broucke, J. De Weerdt, DyLoPro: Profiling the dynamics of event logs, in: Business Process Management, Springer International Publishing, 2023
The full citation will be provided as soon as it is available. A preprint of the (already peer-reviewed) BPM paper can be found here.
Case Studies
The DyLoPro package has already been used to conduct an extensive analysis of the dynamics present in a number of commonly used real-life event logs.
These case studies, conducted for the BPM paper, can be found here.
Contributing
Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.
You can contribute in many ways. For more information, please refer to the CONTRIBUTING.md file.
Release Notes
Please consult HISTORY.md for the release notes.
License
Free software: GNU General Public License v3
Credits
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage
project template.
- Cookiecutter: https://github.com/audreyr/cookiecutter
audreyr/cookiecutter-pypackage
: https://github.com/audreyr/cookiecutter-pypackage
History
0.1.1 (2023-07-05)
Changed
- Improved the README.md file.
Added
- Correctly formatted docstrings of all modules, including the utility files.
- Documentation page on readthedocs.org.
Deprecated
- 0.1.1 will be the last version that contains the
distinct_variants_evol()
plotting method, and will be removed in the next minor update. Reason being: the method solely duplicates a small part of the visualization capabilities offered by thedistinct_variants_AdvancedEvol()
method. Use the more extensivedistinct_variants_AdvancedEvol()
visualization method instead.
0.1.0 (2023-06-21)
- First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file DyLoPro-0.1.1.tar.gz
.
File metadata
- Download URL: DyLoPro-0.1.1.tar.gz
- Upload date:
- Size: 64.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 274f3f89afc693a1e227f1bd9f424f2cde3a177fe424ea172e2232e7d16b5cfc |
|
MD5 | 6de57e1416bb2c179a9197dcd0fb634f |
|
BLAKE2b-256 | 13cb4e9152446c1dfad9dbdee17b8e92ca60aabe74a306df6045d375d132a7f7 |
File details
Details for the file DyLoPro-0.1.1-py2.py3-none-any.whl
.
File metadata
- Download URL: DyLoPro-0.1.1-py2.py3-none-any.whl
- Upload date:
- Size: 60.2 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e3c5380588dd3ed95e64ae62505c905fef3bc535b506a28d845437cbe3901d8 |
|
MD5 | 708e42c47123ba9366a31efca7515541 |
|
BLAKE2b-256 | 340ffcc410e6106cf0444d9f5c08e247fb8c8f7b1eacbbafe4b2860c53931415 |