A library for doing research on developmental interpretability

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

svwingerden

These details have not been verified by PyPI

Project description

DevInterp

Python version Contributors

A Python Library for Developmental Interpretability Research

DevInterp is a python library for conducting research on developmental interpretability, a novel AI safety research agenda rooted in Singular Learning Theory (SLT). DevInterp proposes tools for detecting, locating, and ultimately controlling the development of structure over training.

Features

SGLD Sampling with per-token loss storage to xarray/Zarr
Local Learning Coefficient (LLC) estimation from sampling results
Susceptibilities measuring first-order posterior response to data perturbations, localized on model components
Bayesian Influence Functions (BIF) as posterior correlations (or covariances) between per-sample losses
Weight restrictions for sampling over parameter subsets (e.g., individual attention heads)

Installation

devinterp is distributed through PyPI. Install with uv:

uv add devinterp

Example

See examples/quickstart.py for a runnable script that computes LLC and susceptibilities on Qwen2.5-0.5B.

Quick Start

Compute the Local Learning Coefficient

from devinterp.slt.llc import llc

result = llc(
    model=model,
    dataset=dataset,              # HuggingFace Dataset with "input_ids"
    observables={"train": dataset},
    lr=0.001,
    n_beta=30,
    num_chains=4,
    num_draws=200,
)

print(result["llc_mean"])         # scalar LLC
print(result["llc_per_chain"])    # (num_chains,) per-chain LLC
print(result["loss_trace"])       # (num_chains, num_steps) per-step loss, num_steps = num_draws * num_steps_bw_draws + num_burnin_steps

Sample with Observables

from devinterp.slt.sampling import sample

tree = sample(
    model=model,
    dataset=train_data,
    observables={
        "train": train_data,
        "code": (code_data, 5),   # (dataset, batches_per_draw)
    },
    lr=0.001,
    n_beta=30,
    num_chains=4,
    num_draws=200,
)
# tree is an xr.DataTree backed by Zarr with full per-token loss traces

Compute Susceptibilities

from devinterp.slt.susceptibilities import susceptibilities
from devinterp.slt.weight_restrictions import create_param_masks

result = susceptibilities(
    model=model,
    dataset=train_data,
    observables={"train": train_data, "code": code_data},
    weight_restrictions={
        "full": None,
        "l0h0": create_param_masks(model, "l0h0"),
        "l0h1": create_param_masks(model, "l0h1"),
    },
    sampling_task="train",
    lr=0.001,
    n_beta=30,
)
# result is a DataTree with /susceptibilities and /context subtrees

create_param_masks supports 85+ HuggingFace model types and TransformerLens. Restriction patterns: "full", "l0", "l0h1", "l0g0" (GQA group), "l0 attn", "l0 mlp", "embed", "unembed".

Compute BIF

from devinterp.slt.bif import bif

result = bif(
    model=model,
    dataset=train_data,
    observables={"train": train_data, "code": code_data},
    lr=0.001,
    n_beta=30,
    num_chains=4,
    num_draws=200,
    correlation_method="token",  # or "sequence"
)
# result["influences"] contains pairwise correlation matrix

Architecture

Each analysis has two entry points:

High-level (llc(), bif(), susceptibilities()): runs sampling and post-processing in one call
Low-level (compute_llc(), compute_bif()): takes a pre-computed xr.DataTree from sample(), useful when you want to run sampling once and compute multiple analyses. compute_susceptibilities() takes a dict[str, xr.DataTree] (one tree per weight restriction), since susceptibilities require a separate sampling run for each restriction.

The sampling pipeline stores full per-token losses to Zarr via sample(), and post-processing functions operate on the resulting xr.DataTree.

Model Requirements

The current API assumes autoregressive language models with fixed-length tokenized sequences:

Model must accept input_ids and return logits (HuggingFace models, TransformerLens HookedTransformer, or any model returning a tensor or object with .logits)
Dataset must be a HuggingFace Dataset with an "input_ids" column of uniform-length sequences
Loss defaults to next-token cross-entropy

For non-standard losses, pass loss_fn=... to sample(), bif(), llc(), or susceptibilities(). The function takes (model, input_ids) and must return per-token loss of shape (batch, seq_len-1). For more exotic control, sample_single_chain() in devinterp.slt.sampler accepts a custom evaluate callable.

Migrating from v1

The v2 API replaces the callback-based sampling with a data-centric pipeline. Key changes:

# v1 (old)
from devinterp.slt.sampler import estimate_learning_coeff_with_summary
from devinterp.optim import SGLD

result = estimate_learning_coeff_with_summary(
    model, loader,
    sampling_method=SGLD,
    sampling_method_kwargs={"lr": 0.001, "nbeta": 30},
    num_chains=4, num_draws=200,
)
llc = result["llc/mean"]

# v2 (new)
from devinterp.slt.llc import llc

result = llc(
    model=model,
    dataset=dataset,                # HF Dataset, not DataLoader
    observables={"train": dataset},
    lr=0.001, n_beta=30,
    num_chains=4, num_draws=200,
)
llc_value = float(result["llc_mean"])

What changed:

estimate_learning_coeff / LLCEstimator / SamplerCallback → llc() and compute_llc()
DataLoader → HuggingFace Dataset with "input_ids" column
sampling_method_kwargs={"nbeta": ...} → n_beta=... as a direct parameter
Results are xr.Dataset / xr.DataTree, not dicts with string keys
New capabilities: susceptibilities(), bif(), observables, weight restrictions, per-token loss storage

Hyperparameter selection

All sampling is sensitive to hyperparameters. See our Sampling Hyperparameter Guide.

Credits & Citations

This package was created by Timaeus. Most of the sampling, LLC, susceptibility, and BIF implementations were developed internally; this package is a port of that joint work.

If this package was useful in your work, please cite it as:

@misc{devinterp2026,
  title   = {DevInterp},
  author  = {Snell, William and Wind, Johan Sokrates and Snikkers, Billy
             and Fraser, Sandy and Newgas, Adam and Hoogland, Jesse
             and Wang, George and Gordon, Andrew and Zhou, William
             and van Wingerden, Stan},
  year    = {2026},
  version = {2.0},
  howpublished = {\url{https://github.com/timaeus-research/devinterp}},
}

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

svwingerden

These details have not been verified by PyPI

Release history Release notifications | RSS feed

2.0.1

Apr 23, 2026

This version

2.0.0

Apr 22, 2026

1.3.2

Jan 24, 2025

1.3.0

Jan 24, 2025

1.2.0

Sep 20, 2024

1.1.0

Aug 30, 2024

1.0.0

Jul 24, 2024

0.2.2

Jun 17, 2024

0.2.1

Jun 17, 2024

0.2.0

Feb 27, 2024

0.1.0

Feb 2, 2024

0.0.3

Jan 9, 2024

0.0.2

Oct 4, 2023

0.0.1

Oct 3, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

devinterp-2.0.0.tar.gz (61.4 kB view details)

Uploaded Apr 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

devinterp-2.0.0-py3-none-any.whl (70.7 kB view details)

Uploaded Apr 22, 2026 Python 3

File details

Details for the file devinterp-2.0.0.tar.gz.

File metadata

Download URL: devinterp-2.0.0.tar.gz
Upload date: Apr 22, 2026
Size: 61.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for devinterp-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`a324da345eebb9956e3839abfb8b1a50a09031efabe510e33b2f8dafe8f9ac9b`
MD5	`f3e2f689b2e2a4bf210243bca9b6851f`
BLAKE2b-256	`2ebaf8a08ae32d669c6c0c11665d423e06f8cc2bb868fc686befcf633fab69f9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for devinterp-2.0.0.tar.gz:

Publisher: publish.yml on timaeus-research/devinterp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: devinterp-2.0.0.tar.gz
- Subject digest: a324da345eebb9956e3839abfb8b1a50a09031efabe510e33b2f8dafe8f9ac9b
- Sigstore transparency entry: 1358144336
- Sigstore integration time: Apr 22, 2026
Source repository:
- Permalink: timaeus-research/devinterp@db452d6d0d0e92051182fa7822b2536cc6bf3a1d
- Branch / Tag: refs/tags/v2.0.0
- Owner: https://github.com/timaeus-research
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@db452d6d0d0e92051182fa7822b2536cc6bf3a1d
- Trigger Event: release

File details

Details for the file devinterp-2.0.0-py3-none-any.whl.

File metadata

Download URL: devinterp-2.0.0-py3-none-any.whl
Upload date: Apr 22, 2026
Size: 70.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for devinterp-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6389502b796ea9c332ebc593afeea9e87a6e488bd3aebcf39890d71f3d0cc557`
MD5	`51e7a836fccea161d9d59634328db43f`
BLAKE2b-256	`6fabbb8471fa7bc470a41034c16d204b6bec5eaf38bed58519b7dd279f09f3bc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for devinterp-2.0.0-py3-none-any.whl:

Publisher: publish.yml on timaeus-research/devinterp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: devinterp-2.0.0-py3-none-any.whl
- Subject digest: 6389502b796ea9c332ebc593afeea9e87a6e488bd3aebcf39890d71f3d0cc557
- Sigstore transparency entry: 1358144438
- Sigstore integration time: Apr 22, 2026
Source repository:
- Permalink: timaeus-research/devinterp@db452d6d0d0e92051182fa7822b2536cc6bf3a1d
- Branch / Tag: refs/tags/v2.0.0
- Owner: https://github.com/timaeus-research
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@db452d6d0d0e92051182fa7822b2536cc6bf3a1d
- Trigger Event: release

devinterp 2.0.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

DevInterp

A Python Library for Developmental Interpretability Research

Features

Installation

Example

Quick Start

Compute the Local Learning Coefficient

Sample with Observables

Compute Susceptibilities

Compute BIF

Architecture

Model Requirements

Migrating from v1

Hyperparameter selection

Further Reading

Credits & Citations

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance