Skip to main content

Interpretability toolbox for LLMs

Project description

Interpreto: Interpretability Toolkit for LLMs

Build status Version Python Version Downloads License: MIT

Explore Interpreto docs »

🚀 Quick Start

The library is available on PyPI, try pip install interpreto to install it.

Checkout the tutorials to get started:

📦 What's Included

Interpreto 🪄 provides a modular framework encompassing Attribution Methods, Concept-Based Methods, and Evaluation Metrics.

Attribution Methods

Interpreto includes both inference-based and gradient-based attribution methods.

They all work seamlessly for both classification (...ForSequenceClassification) and generation (...ForCausalLM)

Inference-based Methods:

Gradient-based methods:

Concept-Based Methods or Mechanistic Interpretability

Concept-based explanations aim to provide high-level interpretations of latent model representations.

Interpreto generalizes these methods through three core steps:

  1. Concept Discovery (e.g., from latent embeddings)
  2. Concept Interpretation (mapping discovered concepts to human-understandable elements)
  3. Concept-to-Output Attribution (assessing concept relevance to model outputs)

Dictionary Learning for Concept Discovery (mainly via Overcomplete):

Available Concept Interpretation Techniques:

Concept Interpretation Techniques Added in the future:

Concept-to-Output Attribution:

Estimate the contribution of each concept to the model output.

Can be obtained with any concept-based explainer via MethodConcepts.concept_output_gradient().

Papers available in the future:

Thanks to this generalization encompassing all concept-based methods and our highly flexible architecture, we can easily obtain a large number of concept-based methods:

Evaluation Metrics

Evaluation Metrics for Attribution

To evaluate attribution methods faithfulness, there are the Insertion and Deletion metrics.

Evaluation Metrics for Concepts

Concept-based methods have several steps that can be evaluated together via ConSim.

Or independently:

👍 Contributing

Feel free to propose your ideas or come and contribute with us on the Interpreto 🪄 toolbox! We have a specific document where we describe in a simple way how to make your first pull request.

👀 See Also

More from the DEEL project:

  • Xplique a Python library dedicated to explaining neural networks (Images, Time Series, Tabular data) on TensorFlow.
  • Puncc a Python library for predictive uncertainty quantification using conformal prediction.
  • oodeel a Python library that performs post-hoc deep Out-of-Distribution (OOD) detection on already trained neural network image classifiers.
  • deel-lip a Python library for training k-Lipschitz neural networks on TensorFlow.
  • deel-torchlip a Python library for training k-Lipschitz neural networks on PyTorch.
  • Influenciae a Python library dedicated to computing influence values for the discovery of potentially problematic samples in a dataset.
  • DEEL White paper a summary of the DEEL team on the challenges of certifiable AI and the role of data quality, representativity and explainability for this purpose.

🙏 Acknowledgments

This project received funding from the French ”Investing for the Future – PIA3” program within the Artificial and Natural Intelligence Toulouse Institute (ANITI). The authors gratefully acknowledge the support of the DEEL and the FOR projects.

👨‍🎓 Creators

Interpreto 🪄 is a project of the FOR and the DEEL teams at the IRT Saint-Exupéry in Toulouse, France.

🗞️ Citation

If you use Interpreto 🪄 as part of your workflow in a scientific publication, please consider citing 🗞️ our paper:

@article{poche2025interpreto,
    title       = {Interpreto: An Explainability Library for Transformers},
    author      = {Poch{\'e}, Antonin and Mullor, Thomas and Sarti, Gabriele and Boisnard, Fr{\'e}d{\'e}ric and Friedrich, Corentin and Claye, Charlotte and Hoofd, Fran{\c{c}}ois and Bernas, Raphael and Hudelot, C{\'e}line and Jourdan, Fanny},
    journal     = {arXiv preprint arXiv:2512.09730},
    year        = {2025}
}

📝 License

The package is released under MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

interpreto-0.4.13.tar.gz (156.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

interpreto-0.4.13-py3-none-any.whl (226.3 kB view details)

Uploaded Python 3

File details

Details for the file interpreto-0.4.13.tar.gz.

File metadata

  • Download URL: interpreto-0.4.13.tar.gz
  • Upload date:
  • Size: 156.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for interpreto-0.4.13.tar.gz
Algorithm Hash digest
SHA256 4a5278ac6a88b529928c86471df3daa08cd6607ce8c7edc3aba1d18fc7ccf37a
MD5 4f8c8d13f5598a291e8efeee038ef425
BLAKE2b-256 f9f988a4db7680583caf3e43e3b6183908cd8c951db21b9c8ea89ded590fd470

See more details on using hashes here.

Provenance

The following attestation bundles were made for interpreto-0.4.13.tar.gz:

Publisher: release.yml on FOR-sight-ai/interpreto

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file interpreto-0.4.13-py3-none-any.whl.

File metadata

  • Download URL: interpreto-0.4.13-py3-none-any.whl
  • Upload date:
  • Size: 226.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for interpreto-0.4.13-py3-none-any.whl
Algorithm Hash digest
SHA256 121e82852eed1ddee3566254f95a3371f690c0fc466ca4a18c1839d6854c6082
MD5 01218a701d55b0f60f75f1699393f0db
BLAKE2b-256 680d44dccd7869671bf6fc7a9c5d4f7be57e3865ef7fdce031b122dd345ee32c

See more details on using hashes here.

Provenance

The following attestation bundles were made for interpreto-0.4.13-py3-none-any.whl:

Publisher: release.yml on FOR-sight-ai/interpreto

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page