Sparse probing benchmark for Sparse Autoencoders derived from the paper "Are Sparse Autoencoders Useful? A Case Study in Sparse Probing"

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

chanind

These details have not been verified by PyPI

Project description

SAE Probes Benchmark

This repository contains the code for the paper Are Sparse Autoencoders Useful? A Case Study in Sparse Probing, but has been reformatted into a Python package that will work with any SAE that can be loaded in SAELens. This makes it easy to use the sparse probing tasks from the paper as a standalone SAE benchmark.

Installation

pip install sae-probes

Running evaluations

You can run benchmarks directly; any missing model activations are generated on demand. If you don't pass a model_cache_path, a temporary directory is used and cleaned up when the function completes. To persist activations across runs (recommended for repeated experiments), provide a model_cache_path.

Training Probes

Probes can be trained directly on the model activations (baselines) or on SAE activations. In both cases, the following test data-balance settings are available: "normal", "scarcity", and "imbalance". For more details about these settings, see the original paper. For the most standard sparse-probing benchmark, use the normal setting.

SAE Probes

The most standard use of this library is as a sparse probing benchmark for SAEs using the normal setting. This is demonstrated below:

from sae_probes import run_sae_evals
from sae_lens import SAE

# run the benchmark on a Gemma Scope SAE
release = "gemma-scope-2b-pt-res-canonical"
sae_id = "layer_12/width_16k/canonical"
sae = SAE.from_pretrained(release, sae_id)

run_sae_evals(
  sae=sae,
  model_name="gemma-2-2b",
  hook_name="blocks.12.hook_resid_post",
  reg_type="l1",
  setting="normal",
  results_path="/results/output/path",
  # model_cache_path is optional; if omitted, a temp dir is used and cleared after
  model_cache_path="/path/to/saved/activations",
  ks=[1, 16],
)

The sparse probing results for each dataset will be saved to results_path as a JSON file per dataset.

Baseline Probes

You can now run baseline probes using a unified API that matches the SAE evaluation interface:

from sae_probes import run_baseline_evals

# Run baseline probes with consistent API
run_baseline_evals(
  model_name="gemma-2-2b",
  hook_name="blocks.12.hook_resid_post",
  setting="normal",  # or "scarcity", "imbalance"
  results_path="/results/output/path",
  # model_cache_path is optional; if omitted, a temp dir is used and cleared after
  model_cache_path="/path/to/saved/activations",
)

Output Format

Both SAE and baseline probes now save results as JSON files with consistent structure:

SAE results: sae_probes_{model_name}/{setting}_setting/{dataset}_{hook_name}_{reg_type}.json
Baseline results: baseline_results_{model_name}/{setting}_setting/{dataset}_{hook_name}_{method}.json

Each JSON file contains a list with metrics and metadata for easy comparison between SAE and baseline approaches.

Optional: Pre-generating model activations

Pre-generating can speed up repeated runs and lets you inspect the saved tensors. It's optional because benchmarks will auto-generate missing activations on their first run if missing.

from sae_probes import generate_dataset_activations

generate_dataset_activations(
  model_name="gemma-2-2b", # the TransformerLens name of the model
  hook_names=["blocks.12.hook_resid_post"], # Any TLens hook names
  batch_size=64,
  device="cuda",
  model_cache_path="/path/to/save/activations",
)

If you skip pre-generation, the benchmarks will create any missing activations automatically. Passing a model_cache_path persists them; if omitted, activations will be written to a temporary directory that is deleted after the run.

Citation

If you use this code in your research, please cite:

@inproceedings{kantamnenisparse,
  title={Are Sparse Autoencoders Useful? A Case Study in Sparse Probing},
  author={Kantamneni, Subhash and Engels, Joshua and Rajamanoharan, Senthooran and Tegmark, Max and Nanda, Neel},
  booktitle={Forty-second International Conference on Machine Learning}
}

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

chanind

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.0

Apr 29, 2026

0.3.0

Oct 31, 2025

0.2.2

Oct 30, 2025

0.2.1

Oct 4, 2025

This version

0.2.0

Aug 27, 2025

0.1.5

Aug 24, 2025

0.1.4

Aug 21, 2025

0.1.3

Aug 16, 2025

0.1.1

Aug 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sae_probes-0.2.0.tar.gz (45.0 MB view details)

Uploaded Aug 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sae_probes-0.2.0-py3-none-any.whl (45.1 MB view details)

Uploaded Aug 27, 2025 Python 3

File details

Details for the file sae_probes-0.2.0.tar.gz.

File metadata

Download URL: sae_probes-0.2.0.tar.gz
Upload date: Aug 27, 2025
Size: 45.0 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sae_probes-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`c066ddda3b9d11b4e5b2334c1827075506b16a1c1f60aef8e4e673ff861fc665`
MD5	`91842fe3f7e724c3de8b20a77fb8fa2d`
BLAKE2b-256	`5214c01fa3dde945c633dafeabf548373e4625810c39fa5e21d05b83cb191317`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sae_probes-0.2.0.tar.gz:

Publisher: ci.yaml on sae-probes/sae-probes

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sae_probes-0.2.0.tar.gz
- Subject digest: c066ddda3b9d11b4e5b2334c1827075506b16a1c1f60aef8e4e673ff861fc665
- Sigstore transparency entry: 441443853
- Sigstore integration time: Aug 27, 2025
Source repository:
- Permalink: sae-probes/sae-probes@1c8ba69c52a1ee190d25207180350b770d044312
- Branch / Tag: refs/heads/main
- Owner: https://github.com/sae-probes
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yaml@1c8ba69c52a1ee190d25207180350b770d044312
- Trigger Event: push

File details

Details for the file sae_probes-0.2.0-py3-none-any.whl.

File metadata

Download URL: sae_probes-0.2.0-py3-none-any.whl
Upload date: Aug 27, 2025
Size: 45.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sae_probes-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9585938322eed9a2e2526a5e7d5c8efae3cf08bdfcb63dc39fb68551a6390091`
MD5	`d00db997661adbedef408eb428e00c6c`
BLAKE2b-256	`41b491d8eb63641e620a63d09678f3ba331d3681c5d7adb3cd736e8a8337b862`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sae_probes-0.2.0-py3-none-any.whl:

Publisher: ci.yaml on sae-probes/sae-probes

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sae_probes-0.2.0-py3-none-any.whl
- Subject digest: 9585938322eed9a2e2526a5e7d5c8efae3cf08bdfcb63dc39fb68551a6390091
- Sigstore transparency entry: 441443896
- Sigstore integration time: Aug 27, 2025
Source repository:
- Permalink: sae-probes/sae-probes@1c8ba69c52a1ee190d25207180350b770d044312
- Branch / Tag: refs/heads/main
- Owner: https://github.com/sae-probes
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yaml@1c8ba69c52a1ee190d25207180350b770d044312
- Trigger Event: push

sae-probes 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

SAE Probes Benchmark

Installation

Running evaluations

Training Probes

SAE Probes

Baseline Probes

Output Format

Optional: Pre-generating model activations

Citation

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance