For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research. Open-sourced and constantly updated.

These details have not been verified by PyPI

Project description

Language-Model-SAEs

[!IMPORTANT] Currently the examples are outdated and some parallelism strategies are not working due to lack of bandwidth. We are working on better organizing recent updates and will make everything work ASAP.

Language-Model-SAEs is a comprehensive, fully-distributed framework designed for training, analyzing and visualizing Sparse Autoencoders (SAEs), empowering scalable and systematic Mechanistic Interpretability research.

News

2026.2.12 We introduce Complete Replacement Models (CRMs), which combine transcoders and Lorsas to fully sparsify language models. Link: Bridging the Attention Gap: Complete Replacement Models for Complete Circuit Tracing.
2025.9.23 We leverage Crosscoder to track feature evolution across pre-training snapshots. Link: Evolution of Concepts in Language Model Pre-Training (ICLR 2026).
2025.8.23 We identify a prevalent low-rank structure in attention outputs as the key cause of dead features, and propose Active Subspace Initialization to improve sparse dictionary learning on these low-rank activations. Link: Attention Layers Add Into Low-Dimensional Residual Subspaces.
2025.4.29 We introduce Low-Rank Sparse Attention (Lorsa) to attack attention superposition, extracting tens of thousands of true attention units from LLM attention layers. Link: Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition (ICLR 2026).
2024.10.29 We introduce Llama Scope, our first contribution to the open-source Sparse Autoencoder ecosystem. Stay tuned! Link: Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders.
2024.10.9 Transformers and Mambas are mechanistically similar in both feature and circuit level. Can we follow this line and find universal motifs and fundamental differences between language model architectures? Link: Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures (ICLR 2025).
2024.5.22 We propose hierarchical tracing, a promising method to scale up sparse feature circuit analysis to industrial sized language models! Link: Automatically Identifying Local and Global Circuits with Linear Computation Graphs (ICML 2024 MI Workshop).
2024.2.19 Our first attempt on SAE-based circuit analysis for Othello-GPT leads us to an example of Attention Superposition in the wild! Link: Dictionary learning improves patch-free circuit discovery in mechanistic interpretability: A case study on othello-gpt.

Features

Scalability: Our framework is fully distributed with arbitrary combinations of data, model, and head parallelism for both training and analysis. Enjoy training SAEs with millions of features!
Flexibility: We support a wide range of SAE variants, including vanilla SAEs, Lorsa (Low-rank Sparse Attention), CLT (Cross-layer Transcoder), MoLT (Mixture of Linear Transforms), CrossCoder, and more. Each variant can be combined with different activation functions (e.g., ReLU, JumpReLU, TopK, BatchTopK) and sparsity penalties (e.g., L1, Tanh).
Easy to Use: We provide high-level runners APIs to quickly launch experiments with simple configurations. Check our examples for verified hyperparameters.
Visualization: We provide a unified web interface to visualize learned SAE variants and their features.

Installation

Use pip to install Language-Model-SAEs:

pip install lm-saes==2.0.0b16

We also highly recommend using uv to manage your own project dependencies. You can use

uv add lm-saes==2.0.0b16

to add Language-Model-SAEs as your project dependency.

Development

We use uv to manage the dependencies, which is an alternative to poetry or pdm. To install the required packages, just install uv, and run the following command:

uv sync

This will install all the required packages for the codebase in .venv directory. For Ascend NPU support, run

uv sync --extra npu

If you want to use the visualization tools, you also need to install the required packages for the frontend, which uses bun for dependency management. Follow the instructions on the website to install it, and then run the following command:

cd ui
bun install

Launch an Experiment

Explore the examples to check the basic usage of training/analyzing SAEs in different configurations. Note a MongoDB is recommended for recording the model/dataset/SAE configurations and required for storing analyses. For more advanced usage, you may explore src/lm_saes/runners folder for the interface for generating activations and training & analyzing SAE variants, and directly write your own variant of training/analyzing script at the runner level.

Visualizing the Learned Dictionary

The analysis results will be saved using MongoDB, and you can use the provided visualization tools to visualize the learned dictionary. First, start the FastAPI server by running the following command:

uvicorn server.app:app --port 24577 --env-file server/.env

Then, copy the ui/.env.example file to ui/.env and modify the BACKEND_URL to fit your server settings (by default, it's http://localhost:24577), and start the frontend by running the following command:

cd ui
bun dev --port 24576

That's it! You can now go to http://localhost:24576 to visualize the learned dictionary and its features.

Development

We highly welcome contributions to this project. If you have any questions or suggestions, feel free to open an issue or a pull request. We are looking forward to hearing from you!

TODO: Add development guidelines

Acknowledgement

The design of the pipeline (including the configuration and some training details) is highly inspired by the mats_sae_training project (now known as SAELens) and heavily relies on the TransformerLens library. We thank the authors for their great work.

Citation

Please cite this library as:

@misc{Ge2024OpenMossSAEs,
    title  = {OpenMoss Language Model Sparse Autoencoders},
    author = {Xuyang Ge, Wentao Shu, Junxuan Wang, Guancheng Zhou, Jiaxing Wu, Fukang Zhu, Lingjie Chen, Zhengfu He},
    url    = {https://github.com/OpenMOSS/Language-Model-SAEs},
    year   = {2024}
}

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

2.0.0b34 pre-release

Apr 18, 2026

2.0.0b33 pre-release

Apr 18, 2026

2.0.0b32 pre-release

Apr 17, 2026

2.0.0b31 pre-release

Apr 17, 2026

2.0.0b30 pre-release

Apr 16, 2026

2.0.0b29 pre-release

Apr 15, 2026

2.0.0b28 pre-release

Apr 14, 2026

2.0.0b27 pre-release

Apr 12, 2026

2.0.0b26 pre-release

Apr 12, 2026

2.0.0b25 pre-release

Apr 10, 2026

2.0.0b24 pre-release

Apr 4, 2026

2.0.0b23 pre-release

Apr 3, 2026

2.0.0b22 pre-release

Apr 2, 2026

2.0.0b21 pre-release

Mar 26, 2026

2.0.0b20 pre-release

Mar 2, 2026

2.0.0b19 pre-release

Mar 1, 2026

2.0.0b18 pre-release

Feb 27, 2026

2.0.0b17 pre-release

Feb 27, 2026

This version

2.0.0b16 pre-release

Feb 27, 2026

2.0.0b15 pre-release

Feb 14, 2026

2.0.0b14 pre-release

Feb 12, 2026

2.0.0b13 pre-release

Feb 11, 2026

2.0.0b12 pre-release

Feb 11, 2026

2.0.0b11 pre-release

Feb 11, 2026

2.0.0b10 pre-release

Feb 11, 2026

2.0.0b9 pre-release

Feb 9, 2026

2.0.0b8 pre-release

Jan 18, 2026

2.0.0b7 pre-release

Jan 18, 2026

2.0.0b6 pre-release

Dec 30, 2025

2.0.0b5 pre-release

Dec 30, 2025

2.0.0b4 pre-release

Dec 18, 2025

2.0.0b3 pre-release

Nov 22, 2025

2.0.0b2 pre-release

Nov 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lm_saes-2.0.0b16.tar.gz (201.0 kB view details)

Uploaded Feb 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lm_saes-2.0.0b16-py3-none-any.whl (243.8 kB view details)

Uploaded Feb 27, 2026 Python 3

File details

Details for the file lm_saes-2.0.0b16.tar.gz.

File metadata

Download URL: lm_saes-2.0.0b16.tar.gz
Upload date: Feb 27, 2026
Size: 201.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lm_saes-2.0.0b16.tar.gz
Algorithm	Hash digest
SHA256	`9d9f7438fa821289971ed040180ad76b21485ee8899fc10f8c2922b927801f3b`
MD5	`17e7da616c22670f8b4eabf2ea5e14f4`
BLAKE2b-256	`3f8cee1f29c521ea764d9c0ffe74e727a448c8fb54587257a30bf4882ec59dd5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lm_saes-2.0.0b16.tar.gz:

Publisher: publish.yml on OpenMOSS/Language-Model-SAEs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lm_saes-2.0.0b16.tar.gz
- Subject digest: 9d9f7438fa821289971ed040180ad76b21485ee8899fc10f8c2922b927801f3b
- Sigstore transparency entry: 1002928353
- Sigstore integration time: Feb 27, 2026
Source repository:
- Permalink: OpenMOSS/Language-Model-SAEs@8bde367ff588fd7ddc60f35dc9daec6ffbf964e6
- Branch / Tag: refs/tags/v2.0.0b16
- Owner: https://github.com/OpenMOSS
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8bde367ff588fd7ddc60f35dc9daec6ffbf964e6
- Trigger Event: push

File details

Details for the file lm_saes-2.0.0b16-py3-none-any.whl.

File metadata

Download URL: lm_saes-2.0.0b16-py3-none-any.whl
Upload date: Feb 27, 2026
Size: 243.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lm_saes-2.0.0b16-py3-none-any.whl
Algorithm	Hash digest
SHA256	`53592b6d5794b763b584edd268049194bd0cfde3e69c0fbbb8858e6915fd3718`
MD5	`6d4e37f9fdc178d7c5803e74318cfc14`
BLAKE2b-256	`5f2bdf85e6e1db183443f860e46443b700d2cb9dbf4d8604ed7102d790c3f782`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lm_saes-2.0.0b16-py3-none-any.whl:

Publisher: publish.yml on OpenMOSS/Language-Model-SAEs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lm_saes-2.0.0b16-py3-none-any.whl
- Subject digest: 53592b6d5794b763b584edd268049194bd0cfde3e69c0fbbb8858e6915fd3718
- Sigstore transparency entry: 1002928354
- Sigstore integration time: Feb 27, 2026
Source repository:
- Permalink: OpenMOSS/Language-Model-SAEs@8bde367ff588fd7ddc60f35dc9daec6ffbf964e6
- Branch / Tag: refs/tags/v2.0.0b16
- Owner: https://github.com/OpenMOSS
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8bde367ff588fd7ddc60f35dc9daec6ffbf964e6
- Trigger Event: push

lm-saes 2.0.0b16

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Language-Model-SAEs

News

Features

Installation

Development

Launch an Experiment

Visualizing the Learned Dictionary

Development

Acknowledgement

Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance