algorithms for prediction and rule mining on event sequences
Project description
Prolothar Rule Mining
Algorithms to learn classification and event sequence prediction rules for event sequence datasets such as process logs.
Based on the publication
Boris Wiegand, Dietrich Klakow, and Jilles Vreeken. Discovering Interpretable Data-to-Sequence Generators. In: Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI), Virtual Event. 2022, pp. 4237–4244.
Prerequisites
Python 3.11+
Usage
If you want to run the algorithms on your own data, follow the steps below.
Installing
pip install prolothar-rule-mining
Creating or reading a dataset of sequences with metadata
You can create datasets manually by
from prolothar_common.models.dataset import TargetSequenceDataset
from prolothar_common.models.dataset.instance import TargetSequenceInstance
#define a list of categorical variables and a list of numeric variables
dataset = TargetSequenceDataset(['color'],['size'])
# add instances, where each instance has three parts:
# 1. a unique hashable ID (e.g. of type int or str)
# 2. a dictionary with attribute names and attribute values
# 3. a (potentially empty) list or tuple of events of type str
dataset.add_instance(TargetSequenceInstance(
1, {'color': 'red', 'size': 100}, []
))
dataset.add_instance(TargetSequenceInstance(
2, {'color': 'blue', 'size': 42}, ['A', 'B']
))
Alternatively, you can read a dataset from an .arff file:
from prolothar_common.models.dataset import TargetSequenceDataset
with open('dataset.arff', 'r') as f:
dataset = TargetSequenceDataset.create_from_arff(f.read(), 'sequence')
Exemplary .arff file:
@RELATION "TestDataset"
@ATTRIBUTE "color" {"blue","red"}
@ATTRIBUTE "size" NUMERIC
@ATTRIBUTE "sequence" {"[]","[A,B]"}
@DATA
"red",100,"[]"
"blue",42,"[A,B]"
Discovering an Event-flow Graph Using ConSequence
from prolothar_rule_mining.rule_miner.data_to_sequence.consequence import ConSequence
consequence = ConSequence()
rules_model = consequence.mine_rules(dataset)
#make predictions
for instance in dataset:
print('=================')
print(instance.get_target_sequence())
print(rules_model.execute(instance))
#get and print the event flow graph
graph = rules_model.get_event_flow_graph()
graph.plot()
graph.plot(view=False, filepath='path_to_pdf')
#get and print the classification rule at each node
for node, router in rules_model.get_node_router_table().items():
print('===============================')
print(f'rule at node {node}')
print(router.get_rule())
# alternative: print(router.get_rule().to_html())
Development
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
Additional Prerequisites
- make (optional)
Compile Cython code
make cython
Running the tests
make test
Deployment
- Change the version in version.txt
- Build and publish the package on pypi by
make clean_package
make package && make publish
- Create and push a tag for this version by
git tag -a [version] -m "describe this version"
git push --tags
Versioning
We use SemVer for versioning.
Authors
If you have any questions, feel free to ask one of our authors:
- Boris Wiegand - boris.wiegand@stahl-holding-saar.de
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file prolothar-rule-mining-3.0.2.tar.gz
.
File metadata
- Download URL: prolothar-rule-mining-3.0.2.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a11b88d5d9650cd84247d0d8c62a9486d906f42103c7e6e67dc3f6f6d01665f6 |
|
MD5 | 089b35643f7688870f94b46b81497d55 |
|
BLAKE2b-256 | a367ca2b9a6856c6c5a56adb00df39fbf30ab8a750c34110f622e888270f1ba8 |
File details
Details for the file prolothar_rule_mining-3.0.2-cp311-cp311-win_amd64.whl
.
File metadata
- Download URL: prolothar_rule_mining-3.0.2-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 2.1 MB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4f9f2fa27365164221b4b89f33abbc6e055946cb57a1045b9ab7f8eb5984732f |
|
MD5 | 63d207f8e34701d397ce0bf71e809cc1 |
|
BLAKE2b-256 | 01f29a28908622166cf3b8edb7838c951c7d5b174a8bc07f5eaa67e931666c59 |