A package for mechanistic understanding and validation of large AI model with SemanticLens
Project description
An open-source PyTorch library for interpreting and validating large vision models.
Read the paper now as part of Nature Machine Intelligence (Open Access).
SemanticLens is a universal framework for explaining and validating large vision models. While deep learning models are powerful, their internal workings are often a "black box," making them difficult to trust and debug. SemanticLens addresses this by mapping the internal components of a model (like neurons or filters) into the rich, semantic space of a foundation model (e.g., CLIP or SigLIP).
This allows you to "translate" what the model is doing into a human-understandable format, enabling you to search, analyze, and audit its internal representations.
How It Works
Overview of the SemanticLens framework as introduced in our research paper.
The core workflow of SemanticLens involves three main steps:
-
Collect: For each component in a model M, we identify the data samples that cause the highest activation (the "concept examples"). We provide a suite of
ComponentVisualizersthat implement different strategies, from simple activation maximization to relevance-maximization and attribution-based cropping. -
Embed: These examples are then fed into a foundation model (like CLIP), which creates a meaningful vector representation for each component. SemanticLens includes built-in support for OpenCLIP and can be easily extended with other foundation models (see base.py).
-
Analyze: These vector representations enable powerful analyses. The
Lensclass is the main interface for this, orchestrating the preprocessing, caching, and evaluation needed to search and audit your model using its new semantic embeddings.
Installation
You can install SemanticLens directly from PyPI:
pip install semanticlens
To install the latest version from this repository:
pip install git+https://github.com/jim-berend/semanticlens.git
Quickstart
Example usage:
import semanticlens as sl
... # dataset and model setup
# Initialization
cv = sl.component_visualization.ActivationComponentVisualizer(
model,
dataset_model,
dataset_fm,
layer_names=layer_names,
device=device,
cache_dir=cache_dir,
)
fm = sl.foundation_models.OpenClip(url="RN50", pretrained="openai", device=device)
lens = sl.Lens(fm, device=device)
# Semantic Embedding
concept_db = lens.compute_concept_db(cv, batch_size=128, num_workers=8)
aggregated_cpt_db = {k: v.mean(1) for k, v in concept_db.items()}
# Analysis
polysemanticity_scores = lens.eval_polysemanticity(concept_db)
search_results = lens.text_probing(["cats", "dogs"], aggregated_cpt_db)
...
Full quickstart guide: quickstart.ipynb
Package documentation: docs
Contributing
We welcome contributions to SemanticLens! Whether you're fixing a bug, adding a new feature, or improving the documentation, your help is appreciated.
If you'd like to contribute, please follow these steps:
- Fork the repository on GitHub.
- Create a new branch for your feature or bug fix (git checkout -b feature/your-feature-name).
- Make your changes and commit them with a clear message.
- Open a pull request to the main branch of the original repository.
For bug reports or feature requests, please use the GitHub Issues section. Before starting work on a major change, it's a good idea to open an issue first to discuss your plan.
License
Citation
@article{dreyer_mechanistic_2025,
title = {Mechanistic understanding and validation of large {AI} models with {SemanticLens}},
copyright = {2025 The Author(s)},
issn = {2522-5839},
url = {https://www.nature.com/articles/s42256-025-01084-w},
doi = {10.1038/s42256-025-01084-w},
language = {en},
urldate = {2025-08-18},
journal = {Nature Machine Intelligence},
author = {Dreyer, Maximilian and Berend, Jim and Labarta, Tobias and Vielhaben, Johanna and Wiegand, Thomas and Lapuschkin, Sebastian and Samek, Wojciech},
month = aug,
year = {2025},
note = {Publisher: Nature Publishing Group},
keywords = {Computer science, Information technology},
pages = {1--14},
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file semanticlens-0.1.2.tar.gz.
File metadata
- Download URL: semanticlens-0.1.2.tar.gz
- Upload date:
- Size: 21.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
18f510f913dd7031f5e9c7284d0a3893d48b306b2993f04e1cedd17281b8797a
|
|
| MD5 |
724348d376ddcfe42b12588ccb2caddd
|
|
| BLAKE2b-256 |
80bc036b4f4a2df15e9568ea86bb32d7d7931de1d72769f2ba3015ca91d4993c
|
Provenance
The following attestation bundles were made for semanticlens-0.1.2.tar.gz:
Publisher:
python-publish.yml on jim-berend/semanticlens
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
semanticlens-0.1.2.tar.gz -
Subject digest:
18f510f913dd7031f5e9c7284d0a3893d48b306b2993f04e1cedd17281b8797a - Sigstore transparency entry: 407381817
- Sigstore integration time:
-
Permalink:
jim-berend/semanticlens@4833978e8ea994d16d7d35e43922f835e201ab8d -
Branch / Tag:
refs/tags/v0.1.2.post1 - Owner: https://github.com/jim-berend
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@4833978e8ea994d16d7d35e43922f835e201ab8d -
Trigger Event:
release
-
Statement type:
File details
Details for the file semanticlens-0.1.2-py3-none-any.whl.
File metadata
- Download URL: semanticlens-0.1.2-py3-none-any.whl
- Upload date:
- Size: 40.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d953b30bb3a2f7dc5b4d4ce6919c6c37ec6ff754fd6b98fa21a96aad2cf8dc50
|
|
| MD5 |
94da9c454308ed107ae7c60371738c88
|
|
| BLAKE2b-256 |
b41e19805ba346342d32a0a4c06e5dd58010c7b06460b590f200b258d02e4123
|
Provenance
The following attestation bundles were made for semanticlens-0.1.2-py3-none-any.whl:
Publisher:
python-publish.yml on jim-berend/semanticlens
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
semanticlens-0.1.2-py3-none-any.whl -
Subject digest:
d953b30bb3a2f7dc5b4d4ce6919c6c37ec6ff754fd6b98fa21a96aad2cf8dc50 - Sigstore transparency entry: 407381854
- Sigstore integration time:
-
Permalink:
jim-berend/semanticlens@4833978e8ea994d16d7d35e43922f835e201ab8d -
Branch / Tag:
refs/tags/v0.1.2.post1 - Owner: https://github.com/jim-berend
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@4833978e8ea994d16d7d35e43922f835e201ab8d -
Trigger Event:
release
-
Statement type: