Skip to main content

Dynamic Log Profiling package

Project description

DyLoPro logo

The DyLoPro Python Library is a visual analytics tool that allows Process Mining (PM)[^1] practitioners to efficiently and comprehensively explore the dynamics in event logs over time, prior to applying PM techniques. These comprehensive exploration capabilities are provided by extensive set of plotting functionalities, visualizing the dynamics over time from different process perspectives.

[^1]: van der Aalst, W. (2016). Data Science in Action. In: Process Mining. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49851-4_1

  https://en.wikipedia.org/wiki/Process_mining

DyLoPro library is ready-to-use and efficient software implementation of the identically named DyLoPro framework, introduced in the paper DyLoPro: Profiling the Dynamics of Event Logs, which will be presented at the BPM 2023 conference in Utrecht, and accordingly published in its main proceedings. A preprint of the (already peer-reviewed) BPM paper can be found here.

DyLoPro is a comprehensive visual analytics framework designed to explore event log dynamics over time. DyLoPro’s comprehensiveness is achieved through the incorporation of the main process perspectives - the control-flow, data (including resources) and performance, along two orthogonal dimensions of log concepts and representation types. It incorporates six log concepts to capture all essential information from event logs, including variants and directly-follows relations for the control-flow perspective, and categorical and numeric case and event features for the data perspective. These six log concepts can be represented using five representation types, including four performance-oriented ones (throughput time, number of events per case, outcome, and directly-follows-relations’ performance) and one generic type. With this two-dimensional approach, end users can gain a nuanced and holistic view of event log dynamics, efficiently identifying patterns, temporary or permanent changes, and trends of interest from multiple perspectives. Upon identification, they can further analyze these patterns and trends, ultimately leading to more appropriate application of downstream process mining techniques.

Documentation

You can consult the documentation of DyLoPro here.

The following terms are used interchangeably throughout the documentation:

  • 'case' and 'trace'
  • 'time period', 'time bucket' and 'time interval'

Installation

You can install DyLoPro using:

pip install DyLoPro

The DyLoPro PyPi page can be consulted here.

Requirements

DyLoPro depends on the following Python packages:

  • numpy (version >=1.21.5)
  • pandas (version >=2.0.2)
  • matplotlib (version >=3.7.1)
  • tqdm (version >=4.63.0)

If (some of) these requirements are not satisfied yet, then these packages will automatically be installed when installing DyLoPro.

Get Started

Assumptions & Terminology

For the moment, DyLoPro assumes flat event logs, and does not cater to object-centric event logs. DyLoPro also regards events to be the atomic unit of the event log. In other words, the execution of a single event is recorded as a single row in the resulting event log. For the terminology and definitions employed by the 'DyLoPro' Python package, please refer to Section 3 (Preliminaries) of the corresponding academic paper.

Step 1: Initializing a DynamicLogPlots instance

Assuming that you have already loaded an event log into a pd.DataFrame called e.g. event_log. After having imported the DyLoPro package, a DynamicLogPlots instance has to be initialized. The DynamicLogPlots class provides one single source of access to all of DyLoPro's visualization methods, and thereby serves as the interface between the your python environment and DyLoPro’s underlying computational logic.

import DyLoPro as dlp
plot_object = dlp.DynamicLogPlots(event_log, 
                                  case_id_key, 
                                  activity_key, 
                                  timestamp_key,
                                  categorical_casefeatures, 
                                  numerical_casefeatures, 
                                  categorical_eventfeatures, 
                                  numerical_eventfeatures, 
                                  start_date, 
                                  end_date, 
                                  outcome)

After running this block of code, DyLoPro will verify the validity of the event log and all arguments specified.

  • If everything checks out, the event log is preprocessed into an internal format that allows DyLoPro to efficiently compute and visualize all aggregations on an on-demand basis. Afterwards, a DynamicLogPlots object is initialized, and all visualization methods can be accessed by invoking the corresponding methods on this instance.

  • If an error is detected, DyLoPro will raise an error with a dedicated error message describing what went wrong and how it can be resolved.

The formatting requirements of the event log, and of all of the arguments needed to initialize a DynamicLogPlots instance (see code block above) can be consulted here.

Step 2: Accessing all visualization methods

Assuming Step 1 is successfully completed, you can now easily access all visualization methods by simply invoking the appropriate methods on plot_object.

As already mentioned, DyLoPro is the implementation of the identically named DyLoPro framework. Below, you will find a concise summary of the framework, followed by a comprehensive explanation of its implementation as a Python package. This section will guide you through leveraging all the available visualization capabilities. For a more detailed explanation on the framework, please refer to Section 4 of the paper.

Addtionally, the detailed notebooks containing the case studies conducted on a number of commonly used real-life event logs might also improve your understanding on how to use and access DyLoPro's variety of plotting methods.

Framework to package mapping


DyLoPro provides functionality to construct and visualize time series, i.e. the log dynamics, for a variety of log concepts. For each log concept, the dynamics can be represented using five different representation types.

  • log concepts: the main dimensions along which we capture event log dynamics.
  • representation types: how the event log dynamics should be represented and analyzed for each log concept.
Brief Summary Framework

The framework formalizes this procedure in three stages:

  1. Log Discretization: Subdividing the event log into a chronologically ordered set of sub-logs. This is done in two consecutive steps:

    1. Given that the log encompasses data spanning a temporal interval denoted as T, split up T in a chronologically ordered set of equal-length time intervals.
    2. Create the ordered set of sub-logs by assigning each case to exactly one of these time intervals.
  2. Domain Definition:

    • Defining along which log concept to capture log dynamics, and how to represent these dynamics. This boils down to defining the log concept and representation type respectively.
    • The resulting log concept - representation type combination translates into a unique domain-specific mapping function.
  3. Time Series Construction & Visualization:

    • Applying the resulting mapping function to each of the chronologically ordered sub-logs. Thereby creating (multiple) time serie(s).
    • Visualizing the constructed time series.
Framework Implementation DyLoPro Package

As already mentioned, all visualization methods can be accessed by invoking the appropriate methods on your initialized DynamicLogPlots instance. All of DyLoPro's plotting methods construct time series by deriving real-valued measures for a choronologically ordered set of sublogs.

The three-stage framework is implemented in this package as follows:

  • Each of the plotting methods that can be invoked, pertains to exactly one log concept.

  • Given a certain log concept, and hence (one of its) associated plotting methods, the representation type can be passed on to that method by specifying the plt_type argument. The five representation types proposed in the paper correspond to the followig argument values for the plt_type parameter.

    1. Isolated : plt_type='univariate'
    2. Throughput Time (TT) : plt_type='type_tt'
    3. Case Length (NEPC) : plt_type='type_events_case'
    4. Outcome : plt_type='type_outcome'
    5. DFR Performance : plt_type='type_dfr_performance'
  • Log Discretization: Also the Log Discretization can be specified as as arguments of each plotting method.

    1. frequency parameter: Determine the frequency by which the cases are grouped together.
    2. case_assignment parameter: Determines the condition upon which each case is assigned to a certain time interval.

    E.g. if frequency='weekly' and case_assignment='first_event', each case is assigned to the one-week time interval in which its first event occurs and hence each sublog will consist of all cases that were initialized in one particular week.

    Each method is also equipped with an additional set of optional configuration parameters, providing even more customization options to the user. For more information about the parameters corresponding to these methods, please consult the documentation.

The table underneath lists the plotting methods corresponding to each of the log concepts proposed in the DyLoPro framework. You can directly navigate to the detailled documentation of each methodd by clicking on it.

Log Concept Method 1 Method 2
1 Variants topK_variants_evol() variants_evol()
2 Directly-Follows Relations topK_dfr_evol() dfr_evol()
3 Categorical Case Feature topK_categorical_caseftr_evol() /
4 Numerical Case Features num_casefts_evol() /
5 Categorical Event Feature topK_categorical_eventftr_evol() /
6 Numerical Event Features num_eventfts_evol() /

Finally, it is also well-worth mentioning that the extensive capabilities proposed in the the DyLoPro framework are not meant to be exhaustive. The visualization methods offered by the DyLoPro package are consequently meant to be continuously extended and improved. Please find the visualization methods extending the framework listed below:

  • distinct_variants_evol() : NOTE: Deprecated. Will be removed in future versions. Use the distinct_variants_AdvancedEvol() instead.
  • distinct_variants_AdvancedEvol()

Citing DyLoPro

A Demo paper presenting the DyLoPro package is currently in the making.

The DyLoPro package is the software implementation of the identically named DyLoPro framework proposed in the paper DyLoPro: Profiling the Dynamics of Event Logs. This paper will be presented at the BPM 2023 conference in Utrecht, and accordingly published in its main proceedings.

In the meantime, if you are using DyLoPro in your scientiic work, please cite DyLoPro as follows:

B. Wuyts, H. Weytjens, S. vanden Broucke, J. De Weerdt, DyLoPro: Profiling the dynamics of event logs, in: Business Process Management, Springer International Publishing, 2023

The full citation will be provided as soon as it is available. A preprint of the (already peer-reviewed) BPM paper can be found here.

Case Studies

The DyLoPro package has already been used to conduct an extensive analysis of the dynamics present in a number of commonly used real-life event logs.

These case studies, conducted for the BPM paper, can be found here.

Contributing

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

You can contribute in many ways. For more information, please refer to the CONTRIBUTING.md file.

Release Notes

Please consult HISTORY.md for the release notes.

License

Free software: GNU General Public License v3

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

History

0.1.1 (2023-07-05)

Changed

  • Improved the README.md file.

Added

  • Correctly formatted docstrings of all modules, including the utility files.
  • Documentation page on readthedocs.org.

Deprecated

  • 0.1.1 will be the last version that contains the distinct_variants_evol() plotting method, and will be removed in the next minor update. Reason being: the method solely duplicates a small part of the visualization capabilities offered by the distinct_variants_AdvancedEvol() method. Use the more extensive distinct_variants_AdvancedEvol() visualization method instead.

0.1.0 (2023-06-21)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

DyLoPro-0.1.1.tar.gz (64.1 kB view hashes)

Uploaded Source

Built Distribution

DyLoPro-0.1.1-py2.py3-none-any.whl (60.2 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page