Skip to main content

A package for mechanistic understanding and validation of large AI model with SemanticLens

Project description

SemanticLens logo

An open-source PyTorch library for interpreting and validating large vision models.
Read the paper now as part of Nature Machine Intelligence (Open Access).


SemanticLens is a universal framework for explaining and validating large vision models. While deep learning models are powerful, their internal workings are often a "black box," making them difficult to trust and debug. SemanticLens addresses this by mapping the internal components of a model (like neurons or filters) into the rich, semantic space of a foundation model (e.g., CLIP or SigLIP).

This allows you to "translate" what the model is doing into a human-understandable format, enabling you to search, analyze, and audit its internal representations.

How It Works

Overview figure

Overview of the SemanticLens framework as introduced in our research paper.

The core workflow of SemanticLens involves three main steps:

  1. Collect: For each component in a model M, we identify the data samples that cause the highest activation (the "concept examples"). We provide a suite of ComponentVisualizers that implement different strategies, from simple activation maximization to relevance-maximization and attribution-based cropping.

  2. Embed: These examples are then fed into a foundation model (like CLIP), which creates a meaningful vector representation for each component. SemanticLens includes built-in support for OpenCLIP and can be easily extended with other foundation models (see base.py).

  3. Analyze: These vector representations enable powerful analyses. The Lens class is the main interface for this, orchestrating the preprocessing, caching, and evaluation needed to search and audit your model using its new semantic embeddings.

Installation

You can install SemanticLens directly from PyPI:

pip install semanticlens

To install the latest version from this repository:

pip install git+https://github.com/jim-berend/semanticlens.git

Quickstart

Example usage:

import semanticlens as sl

... # dataset and model setup

# Initialization

cv = sl.component_visualization.ActivationComponentVisualizer(
    model,
    dataset_model,
    dataset_fm,
    layer_names=layer_names,
    device=device,
    cache_dir=cache_dir,
)

fm = sl.foundation_models.OpenClip(url="RN50", pretrained="openai", device=device)

lens = sl.Lens(fm, device=device)

# Semantic Embedding 

concept_db = lens.compute_concept_db(cv, batch_size=128, num_workers=8)
aggregated_cpt_db = {k: v.mean(1) for k, v in concept_db.items()}

# Analysis

polysemanticity_scores = lens.eval_polysemanticity(concept_db)

search_results = lens.text_probing(["cats", "dogs"], aggregated_cpt_db)

...

Full quickstart guide: quickstart.ipynb

Package documentation: docs

Contributing

We welcome contributions to SemanticLens! Whether you're fixing a bug, adding a new feature, or improving the documentation, your help is appreciated.

If you'd like to contribute, please follow these steps:

  1. Fork the repository on GitHub.
  2. Create a new branch for your feature or bug fix (git checkout -b feature/your-feature-name).
  3. Make your changes and commit them with a clear message.
  4. Open a pull request to the main branch of the original repository.

For bug reports or feature requests, please use the GitHub Issues section. Before starting work on a major change, it's a good idea to open an issue first to discuss your plan.

License

BSD 3-Clause License

Citation

@article{dreyer_mechanistic_2025,
	title = {Mechanistic understanding and validation of large {AI} models with {SemanticLens}},
	copyright = {2025 The Author(s)},
	issn = {2522-5839},
	url = {https://www.nature.com/articles/s42256-025-01084-w},
	doi = {10.1038/s42256-025-01084-w},
	language = {en},
	urldate = {2025-08-18},
	journal = {Nature Machine Intelligence},
	author = {Dreyer, Maximilian and Berend, Jim and Labarta, Tobias and Vielhaben, Johanna and Wiegand, Thomas and Lapuschkin, Sebastian and Samek, Wojciech},
	month = aug,
	year = {2025},
	note = {Publisher: Nature Publishing Group},
	keywords = {Computer science, Information technology},
	pages = {1--14},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semanticlens-0.1.2.tar.gz (21.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semanticlens-0.1.2-py3-none-any.whl (40.3 kB view details)

Uploaded Python 3

File details

Details for the file semanticlens-0.1.2.tar.gz.

File metadata

  • Download URL: semanticlens-0.1.2.tar.gz
  • Upload date:
  • Size: 21.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for semanticlens-0.1.2.tar.gz
Algorithm Hash digest
SHA256 18f510f913dd7031f5e9c7284d0a3893d48b306b2993f04e1cedd17281b8797a
MD5 724348d376ddcfe42b12588ccb2caddd
BLAKE2b-256 80bc036b4f4a2df15e9568ea86bb32d7d7931de1d72769f2ba3015ca91d4993c

See more details on using hashes here.

Provenance

The following attestation bundles were made for semanticlens-0.1.2.tar.gz:

Publisher: python-publish.yml on jim-berend/semanticlens

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file semanticlens-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: semanticlens-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 40.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for semanticlens-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d953b30bb3a2f7dc5b4d4ce6919c6c37ec6ff754fd6b98fa21a96aad2cf8dc50
MD5 94da9c454308ed107ae7c60371738c88
BLAKE2b-256 b41e19805ba346342d32a0a4c06e5dd58010c7b06460b590f200b258d02e4123

See more details on using hashes here.

Provenance

The following attestation bundles were made for semanticlens-0.1.2-py3-none-any.whl:

Publisher: python-publish.yml on jim-berend/semanticlens

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page