Skip to main content

For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research. Open-sourced and constantly updated.

Project description

Language-Model-SAEs

[!IMPORTANT] Currently the examples are outdated and some parallelism strategies are not working due to lack of bandwidth. We are working on better organizing recent updates and will make everything work ASAP.

Language-Model-SAEs is a comprehensive, fully-distributed framework designed for training, analyzing and visualizing Sparse Autoencoders (SAEs), empowering scalable and systematic Mechanistic Interpretability research.

News

Features

  • Scalability: Our framework is fully distributed with arbitrary combinations of data, model, and head parallelism for both training and analysis. Enjoy training SAEs with millions of features!
  • Flexibility: We support a wide range of SAE variants, including vanilla SAEs, Lorsa (Low-rank Sparse Attention), CLT (Cross-layer Transcoder), MoLT (Mixture of Linear Transforms), CrossCoder, and more. Each variant can be combined with different activation functions (e.g., ReLU, JumpReLU, TopK, BatchTopK) and sparsity penalties (e.g., L1, Tanh).
  • Easy to Use: We provide high-level runners APIs to quickly launch experiments with simple configurations. Check our examples for verified hyperparameters.
  • Visualization: We provide a unified web interface to visualize learned SAE variants and their features.

Installation

Use pip to install Language-Model-SAEs:

pip install lm-saes==2.0.0b7

We also highly recommend using uv to manage your own project dependencies. You can use

uv add lm-saes==2.0.0b7

to add Language-Model-SAEs as your project dependency.

Development

We use uv to manage the dependencies, which is an alternative to poetry or pdm. To install the required packages, just install uv, and run the following command:

uv sync

This will install all the required packages for the codebase in .venv directory. For Ascend NPU support, run

uv sync --extra npu

A forked version of TransformerLens is also included in the dependencies to provide the necessary tools for analyzing features.

If you want to use the visualization tools, you also need to install the required packages for the frontend, which uses bun for dependency management. Follow the instructions on the website to install it, and then run the following command:

cd ui-ssr
bun install

Launch an Experiment

Explore the examples to check the basic usage of training/analyzing SAEs in different configurations. Note a MongoDB is recommended for recording the model/dataset/SAE configurations and required for storing analyses. For more advanced usage, you may explore src/lm_saes/runners folder for the interface for generating activations and training & analyzing SAE variants, and directly write your own variant of training/analyzing script at the runner level.

Visualizing the Learned Dictionary

The analysis results will be saved using MongoDB, and you can use the provided visualization tools to visualize the learned dictionary. First, start the FastAPI server by running the following command:

uvicorn server.app:app --port 24577 --env-file server/.env

Then, copy the ui/.env.example file to ui/.env and modify the BACKEND_URL to fit your server settings (by default, it's http://localhost:24577), and start the frontend by running the following command:

cd ui
bun dev --port 24576

That's it! You can now go to http://localhost:24576 to visualize the learned dictionary and its features.

Development

We highly welcome contributions to this project. If you have any questions or suggestions, feel free to open an issue or a pull request. We are looking forward to hearing from you!

TODO: Add development guidelines

Acknowledgement

The design of the pipeline (including the configuration and some training details) is highly inspired by the mats_sae_training project (now known as SAELens) and heavily relies on the TransformerLens library. We thank the authors for their great work.

Citation

Please cite this library as:

@misc{Ge2024OpenMossSAEs,
    title  = {OpenMoss Language Model Sparse Autoencoders},
    author = {Xuyang Ge, Wentao Shu, Junxuan Wang, Guancheng Zhou, Jiaxing Wu, Fukang Zhu, Lingjie Chen, Zhengfu He},
    url    = {https://github.com/OpenMOSS/Language-Model-SAEs},
    year   = {2024}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lm_saes-2.0.0b7.tar.gz (200.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lm_saes-2.0.0b7-py3-none-any.whl (242.5 kB view details)

Uploaded Python 3

File details

Details for the file lm_saes-2.0.0b7.tar.gz.

File metadata

  • Download URL: lm_saes-2.0.0b7.tar.gz
  • Upload date:
  • Size: 200.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lm_saes-2.0.0b7.tar.gz
Algorithm Hash digest
SHA256 e66499b7155b84048e58bc6375b4ff055982b868e9371a39b2925a2204bb245d
MD5 8e8d9eb4b5d87d786c6b88295c85e0c5
BLAKE2b-256 e0f671de2d21902aed59d377f03b32830f1e9a3c8de968a3e77d670c37315c7d

See more details on using hashes here.

Provenance

The following attestation bundles were made for lm_saes-2.0.0b7.tar.gz:

Publisher: publish.yml on OpenMOSS/Language-Model-SAEs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lm_saes-2.0.0b7-py3-none-any.whl.

File metadata

  • Download URL: lm_saes-2.0.0b7-py3-none-any.whl
  • Upload date:
  • Size: 242.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lm_saes-2.0.0b7-py3-none-any.whl
Algorithm Hash digest
SHA256 ac2a62d58965efae902d44130c6c8623deb2692443e9ae3c3327de0d9f893332
MD5 36c1ae6857de6556198fb8e78c85ab8a
BLAKE2b-256 6ae67ebcc9685e0d99b11dd0b94288d1e09d552d9d55cc52eaed87255d55656d

See more details on using hashes here.

Provenance

The following attestation bundles were made for lm_saes-2.0.0b7-py3-none-any.whl:

Publisher: publish.yml on OpenMOSS/Language-Model-SAEs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page