Skip to main content

Interpretability toolbox for LLMs

Project description

Interpreto: Interpretability Toolkit for LLMs

Build status Version Python Version Downloads License: MIT

Explore Interpreto docs »

🚀 Quick Start

The library is available on PyPI, try pip install interpreto to install it.

Checkout the tutorials to get started:

📦 What's Included

Interpreto 🪄 provides a modular framework encompassing Attribution Methods, Concept-Based Methods, and Evaluation Metrics.

Attribution Methods

Interpreto includes both inference-based and gradient-based attribution methods.

They all work seamlessly for both classification (...ForSequenceClassification) and generation (...ForCausalLM)

Inference-based Methods:

Gradient-based methods:

Concept-Based Methods or Mechanistic Interpretability

Concept-based explanations aim to provide high-level interpretations of latent model representations.

Interpreto generalizes these methods through three core steps:

  1. Concept Discovery (e.g., from latent embeddings)
  2. Concept Interpretation (mapping discovered concepts to human-understandable elements)
  3. Concept-to-Output Attribution (assessing concept relevance to model outputs)

Dictionary Learning for Concept Discovery (mainly via Overcomplete):

Available Concept Interpretation Techniques:

Concept Interpretation Techniques Added in the future:

Concept-to-Output Attribution:

Estimate the contribution of each concept to the model output.

Can be obtained with any concept-based explainer via MethodConcepts.concept_output_gradient().

Papers available in the future:

Thanks to this generalization encompassing all concept-based methods and our highly flexible architecture, we can easily obtain a large number of concept-based methods:

Evaluation Metrics

Evaluation Metrics for Attribution

To evaluate attribution methods faithfulness, there are the Insertion and Deletion metrics.

Evaluation Metrics for Concepts

Concept-based methods have several steps that can be evaluated together via ConSim.

Or independently:

👍 Contributing

Feel free to propose your ideas or come and contribute with us on the Interpreto 🪄 toolbox! We have a specific document where we describe in a simple way how to make your first pull request.

👀 See Also

More from the DEEL project:

  • Xplique a Python library dedicated to explaining neural networks (Images, Time Series, Tabular data) on TensorFlow.
  • Puncc a Python library for predictive uncertainty quantification using conformal prediction.
  • oodeel a Python library that performs post-hoc deep Out-of-Distribution (OOD) detection on already trained neural network image classifiers.
  • deel-lip a Python library for training k-Lipschitz neural networks on TensorFlow.
  • deel-torchlip a Python library for training k-Lipschitz neural networks on PyTorch.
  • Influenciae a Python library dedicated to computing influence values for the discovery of potentially problematic samples in a dataset.
  • DEEL White paper a summary of the DEEL team on the challenges of certifiable AI and the role of data quality, representativity and explainability for this purpose.

🙏 Acknowledgments

This project received funding from the French ”Investing for the Future – PIA3” program within the Artificial and Natural Intelligence Toulouse Institute (ANITI). The authors gratefully acknowledge the support of the DEEL and the FOR projects.

👨‍🎓 Creators

Interpreto 🪄 is a project of the FOR and the DEEL teams at the IRT Saint-Exupéry in Toulouse, France.

🗞️ Citation

If you use Interpreto 🪄 as part of your workflow in a scientific publication, please consider citing 🗞️ our paper:

@article{poche2025interpreto,
    title       = {Interpreto: An Explainability Library for Transformers},
    author      = {Poch{\'e}, Antonin and Mullor, Thomas and Sarti, Gabriele and Boisnard, Fr{\'e}d{\'e}ric and Friedrich, Corentin and Claye, Charlotte and Hoofd, Fran{\c{c}}ois and Bernas, Raphael and Hudelot, C{\'e}line and Jourdan, Fanny},
    journal     = {arXiv preprint arXiv:2512.09730},
    year        = {2025}
}

📝 License

The package is released under MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

interpreto-0.4.12.tar.gz (156.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

interpreto-0.4.12-py3-none-any.whl (226.0 kB view details)

Uploaded Python 3

File details

Details for the file interpreto-0.4.12.tar.gz.

File metadata

  • Download URL: interpreto-0.4.12.tar.gz
  • Upload date:
  • Size: 156.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for interpreto-0.4.12.tar.gz
Algorithm Hash digest
SHA256 b7b566650ee73042c3fcacc520af6b9cbe43eb945829924a9e0747157c9562e7
MD5 64e8dbcf2c08b62c6122b24f231c7e51
BLAKE2b-256 029789f04cd704532599741975e98c93bdb8f263d6053862ab0d387e84c96a8b

See more details on using hashes here.

Provenance

The following attestation bundles were made for interpreto-0.4.12.tar.gz:

Publisher: release.yml on FOR-sight-ai/interpreto

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file interpreto-0.4.12-py3-none-any.whl.

File metadata

  • Download URL: interpreto-0.4.12-py3-none-any.whl
  • Upload date:
  • Size: 226.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for interpreto-0.4.12-py3-none-any.whl
Algorithm Hash digest
SHA256 1725ff3dc2c1bf891539fc5b68f324233fdfca0090af9c960f1cc4cf49366566
MD5 704038b5b45cd4fc2d0bbbc6c7f8986a
BLAKE2b-256 3d2f6e18db0d47dc2f83cc74e722fe6e93e289f9e2f39d4447b890ac5a44ac7a

See more details on using hashes here.

Provenance

The following attestation bundles were made for interpreto-0.4.12-py3-none-any.whl:

Publisher: release.yml on FOR-sight-ai/interpreto

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page