Sparse Autoencoder for Mechanistic Interpretability

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Sparse Autoencoder

PyPI - License

A sparse autoencoder for mechanistic interpretability research.

Train a Sparse Autoencoder in colab, or install for your project:

pip install sparse_autoencoder

Features

This library contains:

A sparse autoencoder model, along with all the underlying PyTorch components you need to customise and/or build your own:
- Encoder, constrained unit norm decoder and tied bias PyTorch modules in autoencoder.
- L1 and L2 loss modules in loss.
- Adam module with helper method to reset state in optimizer.
Activations data generator using TransformerLens, with the underlying steps in case you want to customise the approach:
- Activation store options (in-memory or on disk) in activation_store.
- Hook to get the activations from TransformerLens in an efficient way in source_model.
- Source dataset (i.e. prompts to generate these activations) utils in source_data, that stream data from HuggingFace and pre-process (tokenize & shuffle).
Activation resampler to help reduce the number of dead neurons.
Metrics that log at various stages of training (e.g. during training, resampling and validation), and integrate with wandb.
Training pipeline that combines everything together, allowing you to run hyperparameter sweeps and view progress on wandb.

Designed for Research

The library is designed to be modular. By default it takes the approach from Towards Monosemanticity: Decomposing Language Models With Dictionary Learning , so you can pip install the library and get started quickly. Then when you need to customise something, you can just extend the class for that component (e.g. you can extend SparseAutoencoder if you want to customise the model, and then drop it back into the training pipeline. Every component is fully documented, so it's nice and easy to do this.

Demo

Check out the demo notebook docs/content/demo.ipynb for a guide to using this library.

Contributing

This project uses Poetry for dependency management, and PoeThePoet for scripts. After checking out the repo, we recommend setting poetry's config to create the .venv in the root directory (note this is a global setting) and then installing with the dev and demos dependencies.

poetry config virtualenvs.in-project true
poetry install --with dev,demos

Checks

For a full list of available commands (e.g. test or typecheck), run this in your terminal (assumes the venv is active already).

poe

Project details

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

1.10.0

Jan 26, 2024

1.9.1

Jan 17, 2024

1.9.0

Jan 16, 2024

1.8.0

Jan 9, 2024

1.7.0

Jan 7, 2024

1.6.0

Jan 6, 2024

1.5.0

Jan 5, 2024

1.4.0

Jan 4, 2024

1.3.0

Jan 3, 2024

1.2.1

Dec 11, 2023

1.2.0

Dec 11, 2023

1.1.0

Dec 11, 2023

1.0.1

Dec 10, 2023

1.0.0

Dec 9, 2023

0.19.0

Dec 9, 2023

0.18.0

Dec 9, 2023

0.17.0

Dec 3, 2023

0.16.0

Dec 3, 2023

0.15.0

Nov 29, 2023

0.14.0

Nov 28, 2023

0.13.0

Nov 26, 2023

0.12.0

Nov 26, 2023

0.11.0

Nov 23, 2023

0.10.0

Nov 22, 2023

0.9.0

Nov 21, 2023

0.8.0

Nov 13, 2023

0.7.0

Nov 8, 2023

0.6.0

Nov 8, 2023

0.5.0

Nov 8, 2023

0.4.0

Nov 7, 2023

0.3.0

Nov 6, 2023

0.2.0

Nov 6, 2023

0.1.0

Nov 6, 2023

0.0.0

Nov 2, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparse_autoencoder-1.10.0.tar.gz (92.6 kB view hashes)

Uploaded Jan 26, 2024 Source

Built Distribution

sparse_autoencoder-1.10.0-py3-none-any.whl (137.3 kB view hashes)

Uploaded Jan 26, 2024 Python 3

Hashes for sparse_autoencoder-1.10.0.tar.gz

Hashes for sparse_autoencoder-1.10.0.tar.gz
Algorithm	Hash digest
SHA256	`1204deea3c3f0cf03174d27d9c926c9d7e8f72a681506f9ed984097d9a4ac151`
MD5	`4a1fd1be8787091772edfaa93e20148c`
BLAKE2b-256	`65ad3eed1de8d60804e455165b270abf9fb11017b9fdeca57f0ac93cfda426c2`

Hashes for sparse_autoencoder-1.10.0-py3-none-any.whl

Hashes for sparse_autoencoder-1.10.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2013aebe12dcc94a101f9e2ab73789646c8c21870780746ba68e68ac6bf3f8f7`
MD5	`1d0f0d98c4a72f9d783823099d025fa0`
BLAKE2b-256	`a7d481cc2465cc663bea3a5006bcd342908a423fffb3f14a2314a177a875310c`