Skip to main content

da4py implements state-of-the-art Process Mining methods over SAT encoding. An Ocaml version is Darksider.

Project description

Author : Boltenhagen Mathilde with Thomas Chatain and Josep Carmona
Date : 09.2019


This project implements Process Mining algorithms with SAT encodings to get optimal results for verification problems. Boolean formulas are first created, then converted into CNF form and solved with SAT solvers, thanks to pysat. This librairy used pm4py Objects.

The project is a translation of the Ocaml version darksider created by Thomas Chatain and Mathilde Boltenhagen.

Scientific papers

  • Encoding Conformance Checking Artefacts in SAT by Mathilde Boltenhagen, Thomas Chatain, Josep Carmona
  • Anti-alignments in conformance checking–the dark side of process models by Thomas Chatain, Josep Carmona
  • Generalized Alignment-Based Trace Clustering of Process Behavior by Mathilde Boltenhagen, Thomas Chatain, Josep Carmona


python 3.7.x

Simply run : pip install da4py



The librairy uses pm4py.

pm4py.objects.petri import importer
pm4py.objects.log.importer.xes import factory as xes_importer
from da4py.src.main.conformanceArtefacts import ConformanceArtefacts  
# get the data with pm4py 
model, m0, mf = importer.pnml.import_net('<PATH_TO_MODEL>')
traces = xes_importer.import_log('<PATH_TO_LOG>')


Formal definition : Given a finite collection $L$ of log traces and a model $N$, an anti-alignment is a run $u \in Runs(N)$ which maximizes its distance $\min_{\sigma \in L} dist(\sigma,u)$ to the log.

This launches the main module. This object, the model and the traces must be reloaded for each experimentation. This is an issue that will be fix soon.

artefacts = ConformanceArtefacts()

We can to set the size of the anti-alignment we want (usefull for prefix) :


For execution times or memory problems, we can set the maximum number of difference that will be tried.


Two types of distances are available :

  • Hamming distance
  • Edit distance

Then an anti-alignment can be found by running :



Then we can compute precision :


Other features

One can add silent transition label that will not cost in the distances :


We can also compute sum instead of min :



The same features (not precision) also work for multi-alignment:

model, m0, mf = importer.pnml.import_net('<PATH_TO_MODEL>')
traces = xes_importer.import_log('<PATH_TO_LOG>')
artefacts = ConformanceArtefacts()

# run a multi-Alignment


AMSTC is a trace clustering method that allows one to extract subnet centroids from a process model. The input is then a log and a model and it outputs a set of subnets and associated clustered traces. The method is implemented in SAT but a sampling method allows to run large logs.

# process model
model, m0, mf = importer.pnml.import_net('examples/medium/model2.pnml')

# log traces
traces = xes_importer.import_log('examples/medium/model2.xes')

# sampleSize : number of traces that are used in the sampling method
sampleSize= 5 

# sizeOfRun : maximal length requested to compute alignment 
sizeOfRun = 8

# maxNbC : maximal number of transitions per cluster to avoid to get a unique centroid
maxNbC = 5

# m : number of cluster that will be searching at each AMSTC of the sampling method. Understand that more than m cluster can 
be returned. 
m = 2

# maxCounter : as this is a sampling method, maxCounter is the number of fails of AMSTC before the sampling method stops
# silent_label : every transition that contains this string will not cost in alignment

The clustering can then be used like :

from pm4py.visualization.petrinet import factory as vizu

for (centroid, traces) in clustering:
    if type(centroid) is tuple:
        net, m0,mf=centroid
        vizu.apply(net, m0, mf).view()

SAT Encoding & Formula Shapes

The tool first constructs SAT formulas using operator classes AND and OR of Those formulas are fully described in the published related papers.

AND( [], [], 
	AND( [m_ip [0, 0]], [m_ip [0, 1]], 
		AND( [], [], 
			OR( [], [], 
				AND( [tau_it [1, 0]], [tau_it [1, 1], tau_it [1, 2]], ) 
				AND( [tau_it [1, 1]], [tau_it [1, 0], tau_it [1, 2]], ) 
				AND( [tau_it [1, 2]], [tau_it [1, 0], tau_it [1, 1]], )) 
			OR( [], [tau_it [1, 0]], 
				AND( [], [], 
					OR( [], [], 
						AND( [m_ip [1, 0], m_ip [0, 0]], [], ) 
						AND( [], [m_ip [1, 0], m_ip [0, 0]], )) 
					AND( [m_ip [1, 1], m_ip [0, 1]], [], ))) 

Then, the formula is translated to a WCNF form which is solved with pysat library.

[[2], [-1], [7, -82], [-8, -82], [-9, -82], [8, -83], [82, 83, 84], [3, -86]...]


Affiliations : LSV, CNRS, ENS Paris-Saclay, Inria, Université Paris-Saclay

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for da4py, version 0.0.3
Filename, size File type Python version Upload date Hashes
Filename, size da4py-0.0.3.tar.gz (34.5 kB) File type Source Python version None Upload date Hashes View
Filename, size da4py-0.0.3-py3-none-any.whl (41.2 kB) File type Wheel Python version py3 Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page