Skip to main content

Generic explainability architecture for text machine learning models

Project description

T_xt Explainability logo

A generic explainability architecture for explaining text machine learning models

PyPI Downloads Python_version Build_passing License Documentation Status Code style: black


text_explainability provides a generic architecture from which well-known state-of-the-art explainability approaches for text can be composed. This modular architecture allows components to be swapped out and combined, to quickly develop new types of explainability approaches for (natural language) text, or to improve a plethora of approaches by improving a single module.

Several example methods are included, which provide local explanations (explaining the prediction of a single instance, e.g. LIME and SHAP) or global explanations (explaining the dataset, or model behavior on the dataset, e.g. TokenFrequency and MMDCritic). By replacing the default modules (e.g. local data generation, global data sampling or improved embedding methods), these methods can be improved upon or new methods can be introduced.

© Marcel Robeer, 2021

Quick tour

Local explanation: explain a models' prediction on a given sample, self-provided or from a dataset.

from text_explainability import LIME, LocalTree

# Define sample to explain
sample = 'Explain why this is positive and not negative!'

# LIME explanation (local feature importance)
LIME().explain(sample, model).scores

# List of local rules, extracted from tree
LocalTree().explain(sample, model).rules

Global explanation: explain the whole dataset (e.g. train set, test set), and what they look like for the ground-truth or predicted labels.

from text_explainability import import_data, TokenFrequency, MMDCritic

# Import dataset
env = import_data('./datasets/test.csv', data_cols=['fulltext'], label_cols=['label'])

# Top-k most frequent tokens per label
TokenFrequency(env.dataset).explain(labelprovider=env.labels, explain_model=False, k=3)

# 2 prototypes and 1 criticisms for the dataset
MMDCritic(env.dataset)(n_prototypes=2, n_criticisms=1)

Installation

See the installation instructions for an extended installation guide.

Method Instructions
pip Install from PyPI via pip3 install text_explainability. To speed up the explanation generation process use pip3 install text_explainability[fast].
Local Clone this repository and install via pip3 install -e . or locally run python3 setup.py install.

Documentation

Full documentation of the latest version is provided at https://text-explainability.readthedocs.io/.

Example usage

See example usage to see an example of how the package can be used, or run the lines in example_usage.py to do explore it interactively.

Explanation methods included

text_explainability includes methods for model-agnostic local explanation and global explanation. Each of these methods can be fully customized to fit the explainees' needs.

Type Explanation method Description Paper/link
Local explanation LIME Calculate feature attribution with Local Intepretable Model-Agnostic Explanations (LIME). [Ribeiro2016], interpretable-ml/lime
KernelSHAP Calculate feature attribution with Shapley Additive Explanations (SHAP). [Lundberg2017], interpretable-ml/shap
LocalTree Fit a local decision tree around a single decision. [Guidotti2018]
LocalRules Fit a local sparse set of label-specific rules using SkopeRules. github/skope-rules
FoilTree Fit a local contrastive/counterfactual decision tree around a single decision. [Robeer2018]
BayLIME Bayesian extension of LIME for include prior knowledge and more consistent explanations. [Zhao201]
Global explanation TokenFrequency Show the top-k number of tokens for each ground-truth or predicted label.
TokenInformation Show the top-k token mutual information for a dataset or model. wikipedia/mutual_information
KMedoids Embed instances and find top-n prototypes (can also be performed for each label using LabelwiseKMedoids). interpretable-ml/prototypes
MMDCritic Embed instances and find top-n prototypes and top-n criticisms (can also be performed for each label using LabelwiseMMDCritic). [Kim2016], interpretable-ml/prototypes

Releases

text_explainability is officially released through PyPI.

See CHANGELOG.md for a full overview of the changes for each version.

Extensions

T_xt sensitivity logo

text_explainability can be extended to also perform sensitivity testing, checking for machine learning model robustness and fairness. The text_sensitivity package is available through PyPI and fully documented at https://text-sensitivity.rtfd.io/.

Citation

@misc{text_explainability,
  title = {Python package text\_explainability},
  author = {Marcel Robeer},
  howpublished = {\url{https://git.science.uu.nl/m.j.robeer/text_explainability}},
  year = {2021}
}

Maintenance

Contributors

Todo

Tasks yet to be done:

  • Implement local post-hoc explanations:
    • Implement Anchors
  • Implement global post-hoc explanations:
    • Representative subset
  • Add support for regression models
  • More complex data augmentation
    • Top-k replacement (e.g. according to LM / WordNet)
    • Tokens to exclude from being changed
    • Bag-of-words style replacements
  • Add rule-based return type
  • Write more tests

Credits

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

text_explainability-0.7.1.tar.gz (103.9 kB view details)

Uploaded Source

Built Distribution

text_explainability-0.7.1-py3-none-any.whl (585.2 kB view details)

Uploaded Python 3

File details

Details for the file text_explainability-0.7.1.tar.gz.

File metadata

  • Download URL: text_explainability-0.7.1.tar.gz
  • Upload date:
  • Size: 103.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.2

File hashes

Hashes for text_explainability-0.7.1.tar.gz
Algorithm Hash digest
SHA256 7bf3dacd817a352035d478a4844450e4beef87f772b385b0ee4a425d090cd036
MD5 639007b13018a5e995b94b5318793137
BLAKE2b-256 6d173ea1ac4a94c00844e0dff2ce67ecf2510cb6b014cdb540fdd1b4c443a830

See more details on using hashes here.

File details

Details for the file text_explainability-0.7.1-py3-none-any.whl.

File metadata

File hashes

Hashes for text_explainability-0.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6841dcc773d0c45a529345ea1978335224d5e2e4accfc86d297e9c5d8e0d956e
MD5 46373d7237fdbf5a3113f27044abbb88
BLAKE2b-256 f22d8a9b2df75692067b92aa85b23e7633390b0ac1e78ebe512a6ed1b534c2fe

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page