Skip to main content

MERLIN

Project description

Pypi Downloads

Stars Watchers

MERLIN

MERLIN is a global, model-agnostic, contrastive explainer for any tabular or text classifier. It provides contrastive explanations of how the behaviour of two machine learning models differs.

Imagine we have a machine learning classifier, let's say M1, and wish to understand how -and to what extent- it differs from a second model M2. MERLIN aims at answering to the following questions:

  1. Can we estimate to what extent M2 classifies data coherently to the predictions made by the M1 model?
  2. Why do the criteria used by M1 result in class c, but M2 does not use the same criteria to classify as c?
  3. Can we use natural language to explain the differences between models making them more comprehensible to final users?

For details and citations, see the references' section.

Install

MERLIN is available on PyPi. Simply run:

pip install merlinxai

Or clone the repository and run:

pip install .

The PyEDA package is required but has not been added to the dependencies. This is due to installation errors on Windows. If you are on Linux or Mac, you should be able to install it by running:

pip3 install pyeda

However, if you are on Windows, we found that the best way to install is through Christophe Gohlke's pythonlibs page. For further information, please consult the official PyEDA installation documentation.

To produce the PDF files, a Graphviz installation is also required. Full documentation on how to install Graphviz on any platform is available here.

Input

MERLIN takes as input the "feature data" (can be training or test, tabular or free text) and the corresponding "labels" predicted by the classifier. This means you don't need to wrap MERLIN within your code at all! As optional parameters, the user can specify:

  • the coverage of the dataset to be used (default is 100%); otherwise, a sampling procedure is used;
  • the surrogate type to be used (decision tree or rulefit);
  • a set of hyperparameters to be used for creating the most accurate surrogate models;
  • the size of the test set to measure the fidelity of the surrogates.

MERLIN on tabular data

In this example, we apply MERLIN on a tabular dataset named Occupancy, which revolves around predicting occupancy in an office room based on sensor measurements of light, temperature, humidity, and CO2 levels. In this case, M1 is responsible for classifying instances during the daytime, while M2 handles instances during the nighttime.

from merlin import MERLIN

exp = MERLIN(X_left, predicted_labels_left,
             X_right, predicted_labels_right,
             data_type='tabular', surrogate_type='sklearn',
             save_path=f'results/',)

exp.run_trace()

BDD2Text

The BDD2Text for Occupancy reveals that one path has not changed between M1 and M2: a high level of light, in the 4th quartile, means that the room is well-lit and is the best indicator for showing whether it is occupied or not.

There is also one added path in M2: at nighttime, having the light variable in the 3rd quartile now leads to a positive classification, which was not true in M1. During the daytime, the light in this 3rd quartile would not have been sufficient to classify a data instance positively, but it is so during nighttime.

exp.run_explain()
exp.explain.BDD2Text()

   

   

Get Rules

The NLE shows the differences between the two models. However, a user might also wish to see example instances in the datasets where these rules apply.

To do so, MERLIN provides the get_rule_examples function, which requires the user to specify a rule to be applied and the number of examples to show.

exp.data_manager['left'].get_rule_examples(rule, n_examples=5)

   

   

MERLIN on text data

The same process can also be applied to text classifiers. For example, in the 20newsgroups dataset, one might closely look at class atheism as for this class, the number of deleted paths is higher than the added ones.

BDD2Text

The NLE for atheism shows the presence of the word bill leads the retrained classifier M2 to assign the label atheism to a specific record, whilst the presence of such a feature was not a criterion for the previous classifier M1. Conversely, the explanation shows that M1 used the feature keith to assign the label, whilst M2 discarded this rule.

Both terms refer to the name of the posts' authors: Bill's posts are only contained within the dataset used to retrain whilst Keith's ones are more frequent in the initial dataset rather than the second one (dataset taken from Jin, P., Zhang, Y., Chen, X., & Xia, Y. Bag-of-embeddings for text classification. In IJCAI-2016).

Finally, M2 discarded the rule having political atheist that was sufficient for M1 for classifying the instance.

   

   

Tutorials and Usage

A complete example of MERLIN usage is provided in the notebook "MERLIN Demo" inside of the main repository folder. A notebook example with ML model training is also available in this repository, which can also be accessed in this Google Colab notebook.

References

To cite MERLIN please refer to the following paper

@article{malandri2023model,
  title={Model-contrastive explanations through symbolic reasoning},
  author={Malandri, Lorenzo and Mercorio, Fabio and Mezzanzanica, Mario and Seveso, Andrea},
  journal={Decision Support Systems},
  pages={114040},
  year={2023},
  publisher={Elsevier}
}

MERLIN generalizes the approach proposed in Malandri, L., Mercorio, F., Mezzanzanica, M., Nobani, N., & Seveso, A. (2022). ContrXT: Generating contrastive explanations from any text classifier. Information Fusion, 81, 103-115. (bibtex)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

merlinxai-0.1.2.tar.gz (30.5 kB view details)

Uploaded Source

Built Distribution

merlinxai-0.1.2-py3-none-any.whl (30.9 kB view details)

Uploaded Python 3

File details

Details for the file merlinxai-0.1.2.tar.gz.

File metadata

  • Download URL: merlinxai-0.1.2.tar.gz
  • Upload date:
  • Size: 30.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for merlinxai-0.1.2.tar.gz
Algorithm Hash digest
SHA256 e9f020cbe1d92a890a4e07f4b28b729a5e65898aee68ac339a107482b34bcb9f
MD5 d6ba63874c489a289ae8a33f43f86ff4
BLAKE2b-256 106ea0a82dd356309a88fe24f0a2474cdd63b03c79fb5d9f2175b298197bcbff

See more details on using hashes here.

File details

Details for the file merlinxai-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: merlinxai-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 30.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for merlinxai-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 203da3360aa7b6a065f218cabfafd0ad863b17a1e52a61ebb4355b161dd3d111
MD5 042a1de2495e99b89dc52aa7245c0a92
BLAKE2b-256 e932093ddd58e90809beff1db9fbea24535ba09ca9613ea47226b0de86a06f85

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page