Skip to main content

A package for mechanistic understanding and validation of large AI model with SemanticLens

Project description

SemanticLens logo

An open-source PyTorch library for interpreting and validating large vision models.
Read the paper now as part of Nature Machine Intelligence (Open Access).


SemanticLens is a universal framework for explaining and validating large vision models. While deep learning models are powerful, their internal workings are often a "black box," making them difficult to trust and debug. SemanticLens addresses this by mapping the internal components of a model (like neurons or filters) into the rich, semantic space of a foundation model (e.g., CLIP or SigLIP).

This allows you to "translate" what the model is doing into a human-understandable format, enabling you to search, analyze, and audit its internal representations.

How It Works

Overview figure

Overview of the SemanticLens framework as introduced in our research paper.

The core workflow of SemanticLens involves three main steps:

  1. Collect: For each component in a model M, we identify the data samples that cause the highest activation (the "concept examples"). We provide a suite of ComponentVisualizers that implement different strategies, from simple activation maximization to relevance-maximization and attribution-based cropping.

  2. Embed: These examples are then fed into a foundation model (like CLIP), which creates a meaningful vector representation for each component. SemanticLens includes built-in support for OpenCLIP and can be easily extended with other foundation models (see base.py).

  3. Analyze: These vector representations enable powerful analyses. The Lens class is the main interface for this, orchestrating the preprocessing, caching, and evaluation needed to search and audit your model using its new semantic embeddings.

Installation

You can install SemanticLens directly from PyPI:

pip install semanticlens

To install the latest version from this repository:

pip install git+https://github.com/jim-berend/semanticlens.git

Quickstart

Example usage:

import semanticlens as sl

... # dataset and model setup

# Initialization

cv = sl.component_visualization.ActivationComponentVisualizer(
    model,
    dataset_model,
    dataset_fm,
    layer_names=layer_names,
    device=device,
    cache_dir=cache_dir,
)

fm = sl.foundation_models.OpenClip(url="RN50", pretrained="openai", device=device)

lens = sl.Lens(fm, device=device)

# Semantic Embedding 

concept_db = lens.compute_concept_db(cv, batch_size=128, num_workers=8)
aggregated_cpt_db = {k: v.mean(1) for k, v in concept_db.items()}

# Analysis

polysemanticity_scores = lens.eval_polysemanticity(concept_db)

search_results = lens.text_probing(["cats", "dogs"], aggregated_cpt_db)

...

Full quickstart guide: quickstart.ipynb

Package documentation: docs

Contributing

We welcome contributions to SemanticLens! Whether you're fixing a bug, adding a new feature, or improving the documentation, your help is appreciated.

If you'd like to contribute, please follow these steps:

  1. Fork the repository on GitHub.
  2. Create a new branch for your feature or bug fix (git checkout -b feature/your-feature-name).
  3. Make your changes and commit them with a clear message.
  4. Open a pull request to the main branch of the original repository.

For bug reports or feature requests, please use the GitHub Issues section. Before starting work on a major change, it's a good idea to open an issue first to discuss your plan.

License

BSD 3-Clause License

Citation

@article{dreyer_mechanistic_2025,
	title = {Mechanistic understanding and validation of large {AI} models with {SemanticLens}},
	copyright = {2025 The Author(s)},
	issn = {2522-5839},
	url = {https://www.nature.com/articles/s42256-025-01084-w},
	doi = {10.1038/s42256-025-01084-w},
	language = {en},
	urldate = {2025-08-18},
	journal = {Nature Machine Intelligence},
	author = {Dreyer, Maximilian and Berend, Jim and Labarta, Tobias and Vielhaben, Johanna and Wiegand, Thomas and Lapuschkin, Sebastian and Samek, Wojciech},
	month = aug,
	year = {2025},
	note = {Publisher: Nature Publishing Group},
	keywords = {Computer science, Information technology},
	pages = {1--14},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semanticlens-0.2.1.tar.gz (21.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semanticlens-0.2.1-py3-none-any.whl (40.3 kB view details)

Uploaded Python 3

File details

Details for the file semanticlens-0.2.1.tar.gz.

File metadata

  • Download URL: semanticlens-0.2.1.tar.gz
  • Upload date:
  • Size: 21.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for semanticlens-0.2.1.tar.gz
Algorithm Hash digest
SHA256 7d10fa2415c0c31f57ed25904187fa2126812a11e0ad0229eb6e4eac82adc808
MD5 80d455cfc4d8c7f26db52f69d5aee73e
BLAKE2b-256 db1ae669c94e9b56530fa9088acea47bdd3977e6afba2d7868a3cf8ce35296a1

See more details on using hashes here.

Provenance

The following attestation bundles were made for semanticlens-0.2.1.tar.gz:

Publisher: python-publish.yml on jim-berend/semanticlens

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file semanticlens-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: semanticlens-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 40.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for semanticlens-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9e6be65b163cd5c99e3a204cc8dadb5e6bea926f4e0f56966598a4d5718b2d81
MD5 18b48959f1e8e03b07adbff813ec461f
BLAKE2b-256 21d31543926fd775348cc7f7b7b07ee4f70e2f2a176c4d69255825295df2cc5a

See more details on using hashes here.

Provenance

The following attestation bundles were made for semanticlens-0.2.1-py3-none-any.whl:

Publisher: python-publish.yml on jim-berend/semanticlens

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page