A library for doing research on developmental interpretability
Project description
DevInterp
A Python Library for Developmental Interpretability Research
DevInterp is a python library for conducting research on developmental interpretability, a novel AI safety research agenda rooted in Singular Learning Theory (SLT). DevInterp proposes tools for detecting, locating, and ultimately controlling the development of structure over training.
Read more about developmental interpretability.
Features
- SGLD Sampling with per-token loss storage to xarray/Zarr
- Local Learning Coefficient (LLC) estimation from sampling results
- Susceptibilities measuring first-order posterior response to data perturbations, localized on model components
- Bayesian Influence Functions (BIF) as posterior correlations (or covariances) between per-sample losses
- Weight restrictions for sampling over parameter subsets (e.g., individual attention heads)
Installation
devinterp is distributed through PyPI. Install with uv:
uv add devinterp
Example
See examples/quickstart.py for a runnable script that computes LLC and susceptibilities on Qwen2.5-0.5B.
Quick Start
Compute the Local Learning Coefficient
from devinterp.slt.llc import llc
result = llc(
model=model,
dataset=dataset, # HuggingFace Dataset with "input_ids"
observables={"train": dataset},
lr=0.001,
n_beta=30,
num_chains=4,
num_draws=200,
)
print(result["llc_mean"]) # scalar LLC
print(result["llc_per_chain"]) # (num_chains,) per-chain LLC
print(result["loss_trace"]) # (num_chains, num_steps) per-step loss, num_steps = num_draws * num_steps_bw_draws + num_burnin_steps
Sample with Observables
from devinterp.slt.sampling import sample
tree = sample(
model=model,
dataset=train_data,
observables={
"train": train_data,
"code": (code_data, 5), # (dataset, batches_per_draw)
},
lr=0.001,
n_beta=30,
num_chains=4,
num_draws=200,
)
# tree is an xr.DataTree backed by Zarr with full per-token loss traces
Compute Susceptibilities
from devinterp.slt.susceptibilities import susceptibilities
from devinterp.slt.weight_restrictions import create_param_masks
result = susceptibilities(
model=model,
dataset=train_data,
observables={"train": train_data, "code": code_data},
weight_restrictions={
"full": None,
"l0h0": create_param_masks(model, "l0h0"),
"l0h1": create_param_masks(model, "l0h1"),
},
sampling_task="train",
lr=0.001,
n_beta=30,
)
# result is a DataTree with /susceptibilities and /context subtrees
create_param_masks supports 85+ HuggingFace model types and TransformerLens.
Restriction patterns: "full", "l0", "l0h1", "l0g0" (GQA group), "l0 attn", "l0 mlp", "embed", "unembed".
Compute BIF
from devinterp.slt.bif import bif
result = bif(
model=model,
dataset=train_data,
observables={"train": train_data, "code": code_data},
lr=0.001,
n_beta=30,
num_chains=4,
num_draws=200,
correlation_method="token", # or "sequence"
)
# result["influences"] contains pairwise correlation matrix
Architecture
Each analysis has two entry points:
- High-level (
llc(),bif(),susceptibilities()): runs sampling and post-processing in one call - Low-level (
compute_llc(),compute_bif()): takes a pre-computedxr.DataTreefromsample(), useful when you want to run sampling once and compute multiple analyses.compute_susceptibilities()takes adict[str, xr.DataTree](one tree per weight restriction), since susceptibilities require a separate sampling run for each restriction.
The sampling pipeline stores full per-token losses to Zarr via sample(), and post-processing functions operate on the resulting xr.DataTree.
Model Requirements
The current API assumes autoregressive language models with fixed-length tokenized sequences:
- Model must accept
input_idsand return logits (HuggingFace models, TransformerLens HookedTransformer, or any model returning a tensor or object with.logits) - Dataset must be a HuggingFace
Datasetwith an"input_ids"column of uniform-length sequences - Loss defaults to next-token cross-entropy
For non-standard losses, pass loss_fn=... to sample(), bif(), llc(), or susceptibilities(). The function takes (model, input_ids) and must return per-token loss of shape (batch, seq_len-1). For more exotic control, sample_single_chain() in devinterp.slt.sampler accepts a custom evaluate callable.
Migrating from v1
The v2 API replaces the callback-based sampling with a data-centric pipeline. Key changes:
# v1 (old)
from devinterp.slt.sampler import estimate_learning_coeff_with_summary
from devinterp.optim import SGLD
result = estimate_learning_coeff_with_summary(
model, loader,
sampling_method=SGLD,
sampling_method_kwargs={"lr": 0.001, "nbeta": 30},
num_chains=4, num_draws=200,
)
llc = result["llc/mean"]
# v2 (new)
from devinterp.slt.llc import llc
result = llc(
model=model,
dataset=dataset, # HF Dataset, not DataLoader
observables={"train": dataset},
lr=0.001, n_beta=30,
num_chains=4, num_draws=200,
)
llc_value = float(result["llc_mean"])
What changed:
estimate_learning_coeff/LLCEstimator/SamplerCallback→llc()andcompute_llc()DataLoader→ HuggingFaceDatasetwith"input_ids"columnsampling_method_kwargs={"nbeta": ...}→n_beta=...as a direct parameter- Results are
xr.Dataset/xr.DataTree, not dicts with string keys - New capabilities:
susceptibilities(),bif(), observables, weight restrictions, per-token loss storage
Hyperparameter selection
All sampling is sensitive to hyperparameters. See our Sampling Hyperparameter Guide.
Further Reading
- You're Measuring Model Complexity Wrong - Introduction to LLC and phase transitions (2024)
- Structural Inference with Susceptibilities (2025)
- Towards Spectroscopy: Susceptibility Clusters in Language Models (2026)
- The Local Learning Coefficient: A Singularity-Aware Complexity Measure (2023)
- Algebraic Geometry and Statistical Learning Theory Watanabe (2009)
Credits & Citations
This package was created by Timaeus. Most of the sampling, LLC, susceptibility, and BIF implementations were developed internally; this package is a port of that joint work.
If this package was useful in your work, please cite it as:
@misc{devinterp2026,
title = {DevInterp},
author = {Snell, William and Wind, Johan Sokrates and Snikkers, Billy
and Fraser, Sandy and Newgas, Adam and Hoogland, Jesse
and Wang, George and Gordon, Andrew and Zhou, William
and van Wingerden, Stan},
year = {2026},
version = {2.0},
howpublished = {\url{https://github.com/timaeus-research/devinterp}},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file devinterp-2.0.0.tar.gz.
File metadata
- Download URL: devinterp-2.0.0.tar.gz
- Upload date:
- Size: 61.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a324da345eebb9956e3839abfb8b1a50a09031efabe510e33b2f8dafe8f9ac9b
|
|
| MD5 |
f3e2f689b2e2a4bf210243bca9b6851f
|
|
| BLAKE2b-256 |
2ebaf8a08ae32d669c6c0c11665d423e06f8cc2bb868fc686befcf633fab69f9
|
Provenance
The following attestation bundles were made for devinterp-2.0.0.tar.gz:
Publisher:
publish.yml on timaeus-research/devinterp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
devinterp-2.0.0.tar.gz -
Subject digest:
a324da345eebb9956e3839abfb8b1a50a09031efabe510e33b2f8dafe8f9ac9b - Sigstore transparency entry: 1358144336
- Sigstore integration time:
-
Permalink:
timaeus-research/devinterp@db452d6d0d0e92051182fa7822b2536cc6bf3a1d -
Branch / Tag:
refs/tags/v2.0.0 - Owner: https://github.com/timaeus-research
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@db452d6d0d0e92051182fa7822b2536cc6bf3a1d -
Trigger Event:
release
-
Statement type:
File details
Details for the file devinterp-2.0.0-py3-none-any.whl.
File metadata
- Download URL: devinterp-2.0.0-py3-none-any.whl
- Upload date:
- Size: 70.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6389502b796ea9c332ebc593afeea9e87a6e488bd3aebcf39890d71f3d0cc557
|
|
| MD5 |
51e7a836fccea161d9d59634328db43f
|
|
| BLAKE2b-256 |
6fabbb8471fa7bc470a41034c16d204b6bec5eaf38bed58519b7dd279f09f3bc
|
Provenance
The following attestation bundles were made for devinterp-2.0.0-py3-none-any.whl:
Publisher:
publish.yml on timaeus-research/devinterp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
devinterp-2.0.0-py3-none-any.whl -
Subject digest:
6389502b796ea9c332ebc593afeea9e87a6e488bd3aebcf39890d71f3d0cc557 - Sigstore transparency entry: 1358144438
- Sigstore integration time:
-
Permalink:
timaeus-research/devinterp@db452d6d0d0e92051182fa7822b2536cc6bf3a1d -
Branch / Tag:
refs/tags/v2.0.0 - Owner: https://github.com/timaeus-research
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@db452d6d0d0e92051182fa7822b2536cc6bf3a1d -
Trigger Event:
release
-
Statement type: