Skip to main content

Python algorithm to discover, from an event log, activity instances that are executed in a batch.

Project description

Batch Processing Discovery

ci version

This technique takes as input an event log (pd.DataFrame) recording the execution of the activities of a process with enabled, start and end timestamps, as well as the resource who performed it, and discovers which activity instances have been executed in a batch, and the characteristics of this batch processing.

The discovered characteristics are, for each batch processing:

  • The activity being executed.
  • The resources involved in this batch processing.
  • The type of batch processing (sequential, concurrent or parallel). In case of more than one type, the most common.
  • The frequency of that activity occurring as part of a batch.
  • The distribution of batch sizes, i.e., for each size, the number of activity instances executed as a batch with that size.
  • The distribution of durations, i.e., for each batch size, the scaling factor of the duration of the activity instances processed in that batch. For example, if the activity is processed in a 2-size batch, each activity instance lasts x0.7 what it lasts executed individually.
  • The firing rules that better describe the start of the batch.

Requirements

  • Python v3.9.5+
  • PIP v21.1.2+
  • Python dependencies: The packages listed in requirements.txt.

Basic Usage

Here we provide a simple example of use with default configuration (see function documentation for more parameters):

import pandas as pd

from batch_processing_discovery.batch_characteristics import discover_batch_processing_and_characteristics
from batch_processing_discovery.config import DEFAULT_CSV_IDS

# Read event log
event_log = pd.read_csv("path/to/event/log.csv.gz")
# Discover batch processing activities and their characteristics
batch_characteristics = discover_batch_processing_and_characteristics(
    event_log=event_log,
    log_ids=DEFAULT_CSV_IDS
)

Discover only batch processing behavior

In case of being interested only in discovering batch processing behavior, the following example applies (see function documentation for more parameters):

import pandas as pd

from batch_processing_discovery.config import DEFAULT_CSV_IDS
from batch_processing_discovery.discovery import discover_batches

# Read event log
event_log = pd.read_csv("path/to/event/log.csv.gz")
# Discover batch processing activities and their characteristics
batched_event_log = discover_batches(
    event_log=event_log,
    log_ids=DEFAULT_CSV_IDS
)

Get batch characteristics with already set batch processing behavior

In case of being interested only in getting the batch characteristics, based on an event log with already set batch behavior, the following example applies (see function documentation for more parameters):

import pandas as pd

from batch_processing_discovery.batch_characteristics import discover_batch_characteristics
from batch_processing_discovery.config import DEFAULT_CSV_IDS

# Read event log
event_log = pd.read_csv("path/to/event/log_with_batch_info.csv.gz")
# Discover batch processing activities and their characteristics
batch_characteristics = discover_batch_characteristics(
    event_log=event_log,
    log_ids=DEFAULT_CSV_IDS
)

** No enabled time available

In case of not enabled time available in the event log, consider using this Python library to estimate them.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

batch_processing_discovery-0.4.4.tar.gz (13.7 kB view hashes)

Uploaded Source

Built Distribution

batch_processing_discovery-0.4.4-py3-none-any.whl (15.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page