Skip to main content

Python algorithm to discover, from an event log, activity instances that are executed in a batch.

Project description

Batch Processing Discovery

ci version

This technique takes as input an event log (pd.DataFrame) recording the execution of the activities of a process with enabled, start and end timestamps, as well as the resource who performed it, and discovers which activity instances have been executed in a batch, and the characteristics of this batch processing.

The discovered characteristics are, for each batch processing:

  • The activity being executed.
  • The resources involved in this batch processing.
  • The type of batch processing (sequential, concurrent or parallel). In case of more than one type, the most common.
  • The frequency of that activity occurring as part of a batch.
  • The distribution of batch sizes, i.e., for each size, the number of activity instances executed as a batch with that size.
  • The distribution of durations, i.e., for each batch size, the scaling factor of the duration of the activity instances processed in that batch. For example, if the activity is processed in a 2-size batch, each activity instance lasts x0.7 what it lasts executed individually.
  • The firing rules that better describe the start of the batch.

Requirements

  • Python v3.9.5+
  • PIP v21.1.2+
  • Python dependencies: The packages listed in requirements.txt.

Basic Usage

Here we provide a simple example of use with default configuration (see function documentation for more parameters):

import pandas as pd

from batch_processing_discovery.batch_characteristics import discover_batch_processing_and_characteristics
from batch_processing_discovery.config import DEFAULT_CSV_IDS

# Read event log
event_log = pd.read_csv("path/to/event/log.csv.gz")
# Discover batch processing activities and their characteristics
batch_characteristics = discover_batch_processing_and_characteristics(
    event_log=event_log,
    log_ids=DEFAULT_CSV_IDS
)

Discover only batch processing behavior

In case of being interested only in discovering batch processing behavior, the following example applies (see function documentation for more parameters):

import pandas as pd

from batch_processing_discovery.config import DEFAULT_CSV_IDS
from batch_processing_discovery.discovery import discover_batches

# Read event log
event_log = pd.read_csv("path/to/event/log.csv.gz")
# Discover batch processing activities and their characteristics
batched_event_log = discover_batches(
    event_log=event_log,
    log_ids=DEFAULT_CSV_IDS
)

Get batch characteristics with already set batch processing behavior

In case of being interested only in getting the batch characteristics, based on an event log with already set batch behavior, the following example applies (see function documentation for more parameters):

import pandas as pd

from batch_processing_discovery.batch_characteristics import discover_batch_characteristics
from batch_processing_discovery.config import DEFAULT_CSV_IDS

# Read event log
event_log = pd.read_csv("path/to/event/log_with_batch_info.csv.gz")
# Discover batch processing activities and their characteristics
batch_characteristics = discover_batch_characteristics(
    event_log=event_log,
    log_ids=DEFAULT_CSV_IDS
)

** No enabled time available

In case of not enabled time available in the event log, consider using this Python library to estimate them.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

batch_processing_discovery-0.4.4.tar.gz (13.7 kB view details)

Uploaded Source

Built Distribution

batch_processing_discovery-0.4.4-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file batch_processing_discovery-0.4.4.tar.gz.

File metadata

File hashes

Hashes for batch_processing_discovery-0.4.4.tar.gz
Algorithm Hash digest
SHA256 976efd353b0314cc352163c160a83117e304c32ca6cb1538265ecaca0371c8ca
MD5 d27756496b722f6b0e89337799cdfa62
BLAKE2b-256 e97cc9a18f23c974910ceeb12576e5bd7f5ba3c94562024d146cf976ea955766

See more details on using hashes here.

File details

Details for the file batch_processing_discovery-0.4.4-py3-none-any.whl.

File metadata

File hashes

Hashes for batch_processing_discovery-0.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 3136dbe2a025077c1f1e6d3a79e3f67a04163b77fdd434f6c7e69ff8acca0e13
MD5 1954b3b32afeb6233df89c7a2688b240
BLAKE2b-256 8d224647d4c7483b2bc689de3600592fe22339c6367a9a1e75c18a9b777a8109

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page