Skip to main content

Python package to estimate the enabled time, enabling activity, and resource availability time of each activity instance in an event log.

Project description

Start Time Estimator

build version

Python implementation of the start time estimation technique presented in the paper "Repairing Activity Start Times to Improve Business Process Simulation", by David Chapela-Campa and Marlon Dumas.

The technique takes as input an event log (pd.DataFrame) recording the execution of the activities of a process (including resource information), and produces a version of that event log with estimated start times for each activity instance.

Requirements

  • Python v3.9.5+
  • PIP v21.1.2+
  • Python dependencies: The packages listed in requirements.txt.

Basic Usage

Check main file for an example of a simple execution, and config file for an explanation of the configuration parameters.

Examples

Here we provide a simple example of use with the default configuration, followed by different custom configurations to run all the versions of the technique.

# Set up default configuration
configuration = Configuration()
# Read event log
event_log = read_csv_log(
    log_path="path/to/event/log.csv.gz",
    config=configuration,
    reset_start_times=True,  # Reset all start times to estimate them all
    sort_by_end_time=True  # Sort log by end time (warning this might alter the order of the events sharing end time)
)
# Estimate start times
extended_event_log = StartTimeEstimator(event_log, configuration).estimate()

The column IDs for the CSV file can be customized so the implementation works correctly with them:

# Set up custom configuration
configuration = Configuration(
    log_ids=EventLogIDs(
        case="case",
        activity="task",
        start_time="start",
        end_time="end",
        resource="resource"
    )
)

Configuration of the proposed approach

With no outlier threshold and using the Median as the statistic to re-estimate the activity instances that couldn't be estimated:

# Set up custom configuration
configuration = Configuration(
    concurrency_oracle_type=ConcurrencyOracleType.HEURISTICS,
    re_estimation_method=ReEstimationMethod.MEDIAN,
    resource_availability_type=ResourceAvailabilityType.SIMPLE
)

With no outlier threshold and using the Mode as the statistic to re-estimate the activity instances that couldn't be estimated:

# Set up custom configuration
configuration = Configuration(
    concurrency_oracle_type=ConcurrencyOracleType.HEURISTICS,
    re_estimation_method=ReEstimationMethod.MODE,
    resource_availability_type=ResourceAvailabilityType.SIMPLE
)

Customize the thresholds for the concurrency detection:

# Set up custom configuration
configuration = Configuration(
    concurrency_oracle_type=ConcurrencyOracleType.HEURISTICS,
    heuristics_thresholds=HeuristicsThresholds(df=0.6, l2l=0.6),
    re_estimation_method=ReEstimationMethod.MODE,
    resource_availability_type=ResourceAvailabilityType.SIMPLE
)

Add an outlier threshold of 200% and set the Mode to calculate the most typical duration too:

# Set up custom configuration
configuration = Configuration(
    concurrency_oracle_type=ConcurrencyOracleType.HEURISTICS,
    re_estimation_method=ReEstimationMethod.MODE,
    resource_availability_type=ResourceAvailabilityType.SIMPLE,
    outlier_statistic=OutlierStatistic.MODE,
    outlier_threshold=2.0
)

Specify bot resources (perform the activities instantly) and instant activities:

# Set up custom configuration
configuration = Configuration(
    concurrency_oracle_type=ConcurrencyOracleType.HEURISTICS,
    re_estimation_method=ReEstimationMethod.MODE,
    resource_availability_type=ResourceAvailabilityType.SIMPLE,
    bot_resources={"SYSTEM", "BOT_001"},
    instant_activities={"Automatic Validation", "Send Notification"}
)

Configuration with a simpler concurrency oracle (Alpha Miner's) for the Enablement Time calculation

# Set up custom configuration
configuration = Configuration(
    concurrency_oracle_type=ConcurrencyOracleType.ALPHA,
    re_estimation_method=ReEstimationMethod.MODE,
    resource_availability_type=ResourceAvailabilityType.SIMPLE
)

Configuration with no concurrency oracle for the Enablement Time calculation (i.e. assuming directly-follows relations)

# Set up custom configuration
configuration = Configuration(
    concurrency_oracle_type=ConcurrencyOracleType.DF,
    re_estimation_method=ReEstimationMethod.MODE,
    resource_availability_type=ResourceAvailabilityType.SIMPLE
)

Configuration only taking into account the Resource Availability Time

# Set up custom configuration
configuration = Configuration(
    concurrency_oracle_type=ConcurrencyOracleType.DEACTIVATED,
    re_estimation_method=ReEstimationMethod.MODE,
    resource_availability_type=ResourceAvailabilityType.SIMPLE
)

Individual Enablement Time Calculation

This package can be used too to calculate the enablement time (and the enabling activity) of the activity instances of an event log, without the need to calculate the resource availability and estimate the start times. A simple example can be found here:

# Set up default configuration
configuration = Configuration(
    log_ids=DEFAULT_CSV_IDS,  # Custom the column IDs with this parameter
    consider_start_times=True  # Consider real parallelism if the start times are available
)
# Read event log
event_log = read_csv_log(
    log_path="path/to/event/log.csv.gz",
    config=configuration,
    sort_by_end_time=True  # Sort log by end time (warning this might alter the order of the events sharing end time)
)
# Instantiate desired concurrency oracle
concurrency_oracle = HeuristicsConcurrencyOracle(event_log, configuration)
# concurrency_oracle = AlphaConcurrencyOracle(event_log, configuration)
# concurrency_oracle = DirectlyFollowsConcurrencyOracle(event_log, configuration)
# Add enablement times to the event log
concurrency_oracle.add_enabled_times(
    event_log,
    set_nat_to_first_event=False,  # Whether to set NaT or the start trace to the events with no enabling activities.
    include_enabling_activity=True  # Whether to include the label of the enabling activity in a new column or not.
)

Warning: If the event log contains start times, set the parameter consider_start_times to true. This parameter allows the enablement time calculator to know that it can trust the start times of the event log to discard those activity instances that are being executed in parallel to the current one as a possible causal predecessor.

For example: if activity A always preceedes activity B, i.e. there are no concurrency, an execution of A can be a causal predecessor of an execution of B (meaning this that A can enable B). Nevertheless, if the start times are available and there is an activity instance of B which starts before the end of A, A does not enable B in that case.

If consider_start_times is set to true, the estimator consider the start time information in this way, if it is set to false, only the end times will be considered.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

start_time_estimator-1.10.11.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

start_time_estimator-1.10.11-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file start_time_estimator-1.10.11.tar.gz.

File metadata

File hashes

Hashes for start_time_estimator-1.10.11.tar.gz
Algorithm Hash digest
SHA256 f7646e1bd6848ebc10273a26493186b72d42a775f8fe641826abed1f71d921dd
MD5 f0e19024227236151090b7a492877d89
BLAKE2b-256 b0e46d3e2dcc51d7517c7c1839e61dc47d6c39e74b9518d4ee7773ef2eda1d4f

See more details on using hashes here.

File details

Details for the file start_time_estimator-1.10.11-py3-none-any.whl.

File metadata

File hashes

Hashes for start_time_estimator-1.10.11-py3-none-any.whl
Algorithm Hash digest
SHA256 6b4d54d0bef27ee12396a24d0b23fcdab7c4d10e07ad82a85d59b23243d0e021
MD5 c0dced0689d59f929fe805fb88a6cd5d
BLAKE2b-256 fe217cc61f4d7790070a24add6c591885728c4f35a33ed44976c00d5f2c8b53e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page