Skip to main content

da4py implements state-of-the-art Process Mining methods over SAT encoding. An Ocaml version is Darksider.

Project description

Author : Boltenhagen Mathilde with Thomas Chatain and Josep Carmona
Date : 09.2019

INTRODUCTION

This project implements Process Mining algorithms with SAT encodings to get optimal results for verification problems. Boolean formulas are first created, then converted into CNF form and solved with SAT solvers, thanks to pysat. This librairy used pm4py Objects.

The project is a translation of the Ocaml version darksider created by Thomas Chatain and Mathilde Boltenhagen.

Scientific papers

  • Encoding Conformance Checking Artefacts in SAT by Mathilde Boltenhagen, Thomas Chatain, Josep Carmona
  • Anti-alignments in conformance checking–the dark side of process models by Thomas Chatain, Josep Carmona
  • Generalized Alignment-Based Trace Clustering of Process Behavior by Mathilde Boltenhagen, Thomas Chatain, Josep Carmona

ENVIRONNEMENT & INSTALLATION

python 3.7.x

Simply run : pip install da4py

(https://pypi.org/project/da4py/0.0.1/)

USAGE

The librairy uses pm4py.

pm4py.objects.petri import importer
pm4py.objects.log.importer.xes import factory as xes_importer
from da4py.src.main.conformanceArtefacts import ConformanceArtefacts  
# get the data with pm4py 
model, m0, mf = importer.pnml.import_net('<PATH_TO_MODEL>')
traces = xes_importer.import_log('<PATH_TO_LOG>')

Anti-alignment

Formal definition : Given a finite collection $L$ of log traces and a model $N$, an anti-alignment is a run $u \in Runs(N)$ which maximizes its distance $\min_{\sigma \in L} dist(\sigma,u)$ to the log.

This launches the main module. This object, the model and the traces must be reloaded for each experimentation. This is an issue that will be fix soon.

artefacts = ConformanceArtefacts()

We can to set the size of the anti-alignment we want (usefull for prefix) :

artefacts.setSize_of_run(10)

For execution times or memory problems, we can set the maximum number of difference that will be tried.

artefacts.setMax_d(10)

Two types of distances are available :

  • Hamming distance
  • Edit distance
artefacts.setDistance_type("edit")

Then an anti-alignment can be found by running :

artefacts.antiAlignment(model,m0,mf,traces)
print(artefacts.getRun())
print(artefacts.getTracesWithDistances())

Precision

Then we can compute precision :

print(artefacts.getPrecision())

Other features

One can add silent transition label that will not cost in the distances :

artefacts.setSilentLabel("tau")

We can also compute sum instead of min :

artefacts.setOptimizeMin(False)

Multi-alignment

The same features (not precision) also work for multi-alignment:

model, m0, mf = importer.pnml.import_net('<PATH_TO_MODEL>')
traces = xes_importer.import_log('<PATH_TO_LOG>')
artefacts = ConformanceArtefacts()
artefacts.setSilentLabel("tau")
artefacts.setDistance_type("hamming")
artefacts.setOptimizeMin(True)
artefacts.setSize_of_run(10)
artefacts.setMax_d(10)

# run a multi-Alignment
artefacts.multiAlignment(model,m0,mf,traces)
print(artefacts.getRun())
print(artefacts.getTracesWithDistances())

AMSTC

AMSTC is a trace clustering method that allows one to extract subnet centroids from a process model. The input is then a log and a model and it outputs a set of subnets and associated clustered traces. The method is implemented in SAT but a sampling method allows to run large logs.

# process model
model, m0, mf = importer.pnml.import_net('examples/medium/model2.pnml')

# log traces
traces = xes_importer.import_log('examples/medium/model2.xes')

# sampleSize : number of traces that are used in the sampling method
sampleSize= 5 

# sizeOfRun : maximal length requested to compute alignment 
sizeOfRun = 8

# maxNbC : maximal number of transitions per cluster to avoid to get a unique centroid
maxNbC = 5

# m : number of cluster that will be searching at each AMSTC of the sampling method. Understand that more than m cluster can 
be returned. 
m = 2

# maxCounter : as this is a sampling method, maxCounter is the number of fails of AMSTC before the sampling method stops
# silent_label : every transition that contains this string will not cost in alignment
clustering=samplingVariantsForAmstc(net,m0,mf,log,sampleSize,sizeOfRun,maxD,maxNbC,m,maxCounter=1,silent_label="tau")

The clustering can then be used like :

from pm4py.visualization.petrinet import factory as vizu

for (centroid, traces) in clustering:
    if type(centroid) is tuple:
        net, m0,mf=centroid
        vizu.apply(net, m0, mf).view()
        print(traces)

SAT Encoding & Formula Shapes

The tool first constructs SAT formulas using operator classes AND and OR of da4py.utils.formulas.py. Those formulas are fully described in the published related papers.

AND( [], [], 
	AND( [m_ip [0, 0]], [m_ip [0, 1]], 
		AND( [], [], 
			OR( [], [], 
				AND( [tau_it [1, 0]], [tau_it [1, 1], tau_it [1, 2]], ) 
				AND( [tau_it [1, 1]], [tau_it [1, 0], tau_it [1, 2]], ) 
				AND( [tau_it [1, 2]], [tau_it [1, 0], tau_it [1, 1]], )) 
			OR( [], [tau_it [1, 0]], 
				AND( [], [], 
					OR( [], [], 
						AND( [m_ip [1, 0], m_ip [0, 0]], [], ) 
						AND( [], [m_ip [1, 0], m_ip [0, 0]], )) 
					AND( [m_ip [1, 1], m_ip [0, 1]], [], ))) 
                    ...

Then, the formula is translated to a WCNF form which is solved with pysat library.

[[2], [-1], [7, -82], [-8, -82], [-9, -82], [8, -83], [82, 83, 84], [3, -86]...]

ACKNOWLEDGEMENT

Affiliations : LSV, CNRS, ENS Paris-Saclay, Inria, Université Paris-Saclay

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

da4py-0.0.8.tar.gz (35.3 kB view details)

Uploaded Source

Built Distribution

da4py-0.0.8-py3-none-any.whl (41.3 kB view details)

Uploaded Python 3

File details

Details for the file da4py-0.0.8.tar.gz.

File metadata

  • Download URL: da4py-0.0.8.tar.gz
  • Upload date:
  • Size: 35.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.9.0

File hashes

Hashes for da4py-0.0.8.tar.gz
Algorithm Hash digest
SHA256 976449f1ceeaff760c1b0eabade421e7092739775a1bf2ebda142cb0df7ddd20
MD5 ace8908591975d267117955be18e681c
BLAKE2b-256 95c915df0f5f4e963df6f6b1cd72ef17da9957aa2e0fbd4ae6d02591b6d46186

See more details on using hashes here.

File details

Details for the file da4py-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: da4py-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 41.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.9.0

File hashes

Hashes for da4py-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 209d963b2b0d72a654f90b9382714bf9a235407754b928279d9e7aaa1d365e75
MD5 d28a82a90abe0069a68441458a3e9811
BLAKE2b-256 e324a49c723cac286ea3859449961e64d1a5734366402281652b204fd305eb96

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page