Skip to main content

algorithms for process mining and data mining on event sequences

Project description

Prolothar Rule Mining

Algorithms to learn classification and event sequence prediction rules for event sequence datasets such as process logs.

Based on the publication

Boris Wiegand, Dietrich Klakow, and Jilles Vreeken. Discovering Interpretable Data-to-Sequence Generators. In: Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI), Virtual Event. 2022, pp. 4237–4244.

Prerequisites

Python 3.11+

Usage

If you want to run the algorithms on your own data, follow the steps below.

Installing

pip install prolothar-rule-mining

Creating or reading a dataset of sequences with metadata

You can create datasets manually by

from prolothar_common.models.dataset import TargetSequenceDataset
from prolothar_common.models.dataset.instance import TargetSequenceInstance

#define a list of categorical variables and a list of numeric variables
dataset = TargetSequenceDataset(['color'],['size'])

# add instances, where each instance has three parts:
# 1. a unique hashable ID (e.g. of type int or str)
# 2. a dictionary with attribute names and attribute values
# 3. a (potentially empty) list or tuple of events of type str
dataset.add_instance(TargetSequenceInstance(
    1, {'color': 'red', 'size': 100}, []
))
dataset.add_instance(TargetSequenceInstance(
    2, {'color': 'blue', 'size': 42}, ['A', 'B']
))

Alternatively, you can read a dataset from an .arff file:

from prolothar_common.models.dataset import TargetSequenceDataset

with open('dataset.arff', 'r') as f:
   dataset = TargetSequenceDataset.create_from_arff(f.read(), 'sequence')

Exemplary .arff file:

@RELATION "TestDataset"

@ATTRIBUTE "color" {"blue","red"}
@ATTRIBUTE "size" NUMERIC
@ATTRIBUTE "sequence" {"[]","[A,B]"}

@DATA
"red",100,"[]"
"blue",42,"[A,B]"

Discovering an Event-flow Graph Using ConSequence

from prolothar_rule_mining.rule_miner.data_to_sequence.consequence import ConSequence

consequence = ConSequence()
rules_model = consequence.mine_rules(dataset)

#make predictions
for instance in dataset:
    print('=================')
    print(instance.get_target_sequence())
    print(rules_model.execute(instance))

#get and print the event flow graph
graph = rules_model.get_event_flow_graph()
graph.plot()
graph.plot(view=False, filepath='path_to_pdf')

#get and print the classification rule at each node
for node, router in rules_model.get_node_router_table().items():
    print('===============================')
    print(f'rule at node {node}')
    print(node.get_rule())
    # alternative: print(node.get_rule().to_html())

Development

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Additional Prerequisites

  • make (optional)

Compile Cython code

make cython

Running the tests

make test

Deployment

make clean_package || make package && make publish

Versioning

We use SemVer for versioning.

Authors

If you have any questions, feel free to ask one of our authors:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prolothar-rule-mining-3.0.0.tar.gz (64.7 MB view details)

Uploaded Source

Built Distribution

prolothar_rule_mining-3.0.0-cp311-cp311-win_amd64.whl (658.8 kB view details)

Uploaded CPython 3.11 Windows x86-64

File details

Details for the file prolothar-rule-mining-3.0.0.tar.gz.

File metadata

  • Download URL: prolothar-rule-mining-3.0.0.tar.gz
  • Upload date:
  • Size: 64.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.1

File hashes

Hashes for prolothar-rule-mining-3.0.0.tar.gz
Algorithm Hash digest
SHA256 d87b06925e84fb0bd648a8678d6d4c07b3951432c87fe0eb86a253ec070dfe99
MD5 256dff8355964ffe4835276e43a0ffc3
BLAKE2b-256 b0f5dbcb3635cac516e5db3d0f4bccd07e17b2fed3cfa1775815dd0a54a10be1

See more details on using hashes here.

File details

Details for the file prolothar_rule_mining-3.0.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for prolothar_rule_mining-3.0.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 34c47e9defa16162f16fac0daf6790c3a120b17030b7c7c79eeb36eaa7d43287
MD5 8335c53fc0a8dd3de8a68ad2f5c357f2
BLAKE2b-256 15b3723f7029e60e49cdb428f334cc3dca45e03460bb5aa9023a92bf25fbd64c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page