Circuit-level regression testing for AI systems
Project description
██ ██ ██████ ████████ ████████ ██████████ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ████ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██████████ ████████ ██ ██ ████████ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ████ ████ ████ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ████████ ██████████ ██ ██
The problem
Behavioral evals (accuracy, BLEU, LLM-as-judge) only see input→output. A model can pass every one of them while the mechanism underneath silently shifts to something brittle — a shortcut, a spurious feature, a circuit that happens to produce the right answer for the wrong reason (feature absorption / shortcut learning). Nothing in a standard eval suite tests whether the internal computation itself is still doing what you think it's doing, so this kind of drift ships unnoticed until it fails somewhere an eval didn't cover.
How Warden solves it
Warden lets you write declarative assertions about a model's internal mechanism — not just its output — and run them like tests: "is this behavior mechanistically necessary and sufficient in this circuit, and hasn't the mechanism silently drifted?"
It uses sparse autoencoders to identify the features involved in a
behavior, and causal patching (ablation / activation-patching) to measure
whether that circuit is actually driving the behavior, rather than merely
correlated with it. The heavy numerics run in a Rust core (PyO3); a Python
layer handles orchestration, the DSL, and the CLI — so contracts read like
tests and slot into a normal pytest run or CI pipeline.
Full docs, one per component, with all the "why" and the real numbers
behind each claim: docs/. This README is the
quickstart; docs/ is where the depth lives.
What's here
Everything below is implemented and verified against real GPT-2 small — not mocked, not synthetic where it mattered:
| capability | what | docs |
|---|---|---|
| Circuit testing | circuit discovery, necessity/sufficiency, contract DSL, pytest plugin, CLI | contracts.md |
| Drift detection | drift assertions, HTML reports, GitHub Action |
drift.md |
| Production monitoring | plugin SDK, warden sample, OpenTelemetry/Prometheus |
production.md |
| Self-serve SAEs | warden train-sae, hand-derived backprop in Rust |
sae-training.md |
Two deliberate simplifications (each doc above explains why):
- Circuits are flat top-k SAE feature lists, not full attribution graphs. Drift is a Jaccard-distance proxy, not graph-edit-distance.
warden samplere-checks fixed contracts on an interval rather than mining circuits from unlabeled live traffic (whichdiscover_circuit's methodology doesn't support).
Install
uv venv .venv && source .venv/bin/activate
uv pip install maturin
maturin develop --release
uv pip install -e ".[dev]"
Note:
--releaseis important — debug builds are ~10-20x slower for training.
Quickstart
python examples/demo.py
# or:
warden run examples/ioi_circuit.warden.yaml --json-out report.json --html-out report.html
warden report report.json
A contract (*.warden.yaml) declares a model, a layer, an SAE, an eval set,
and assertions:
name: ioi_name_mover_circuit
model: gpt2
layer: 9
sae:
repo_id: jbloom/GPT2-Small-SAEs-Reformatted
subfolder: blocks.9.hook_resid_pre
eval_set: ioi_eval.jsonl
assertions:
- type: necessity
min_score: 0.3
- type: sufficiency
min_score: 0.08
Contracts run real model forward passes, so both the pytest plugin and
@warden.contract-decorated functions are skipped by default —
pytest --warden opts in.
Full DSL reference, the Python decorator form, and what necessity/sufficiency actually compute: docs/contracts.md.
Features
-
Necessity & sufficiency — ablate or activation-patch a discovered circuit and measure the causal effect on a real behavior, not just correlation. → docs/contracts.md
-
Drift detection — compare a checkpoint's circuit against a baseline's; demoed catching a real, self-inflicted regression from a fine-tune. → docs/drift.md
-
Plugin SDK — swap in your own model adapter or SAE loader, via a registry or a real Python entry point, no fork required. → docs/production.md
-
Production monitoring —
warden samplere-checks contracts against a live checkpoint path on an interval, exporting real Prometheus/OTel metrics. → docs/production.md -
Self-serve SAE training —
warden train-saefor layers with no public dictionary; hand-derived forward/backward pass in Rust (no autodiff there), verified by numerical gradient checking. → docs/sae-training.md -
HTML reports & GitHub Action — a self-contained report for human review, and a reusable composite action to block merges on regressions. → docs/drift.md, docs/production.md
Results
Real numbers from real runs — not illustrative. Full detail, including two real bugs found (and fixed) by actually running things twice, and an honest report of where the self-serve SAE trainer currently falls short: docs/results.md.
contract 'ioi_name_mover_circuit': PASS
[PASS] necessity=0.390 (min 0.3)
[PASS] sufficiency=0.134 (min 0.08)
A real fine-tune-induced regression, caught:
contract 'ioi_circuit_drift_check': FAIL
[FAIL] drift=0.824 (max 0.5)
A real Prometheus scrape:
warden_necessity{contract="demo", ...} 0.39
Development
cargo test # Rust unit tests (pure ndarray math + gradient checking, no Python needed)
maturin develop --release # rebuild the extension after Rust changes
pytest -m "not integration" # fast, fully offline Python tests
pytest # + integration tests: real GPT-2 + SAE download/forward passes
pytest --warden # also runs *.warden.yaml / @warden.contract items directly
-m "not integration" controls which of this repo's own tests touch the
network/a real model; --warden controls whether contracts a warden user
writes run when their project's pytest executes — two independent gates
for two different things. CI (.github/workflows/ci.yml) runs cargo test
pytest -m "not integration"only — no network needed.
See docs/README.md for the full documentation index.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file warden_interp-0.1.0.tar.gz.
File metadata
- Download URL: warden_interp-0.1.0.tar.gz
- Upload date:
- Size: 61.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db3b7a0e991226db65ac2efe02c9af38cf342c54a4392f9d7bae4f317b592718
|
|
| MD5 |
8693f2e27e87f1d01163f2e58743e70f
|
|
| BLAKE2b-256 |
f24d993642538d65a72fe31a48ec8071e0ee1371edac10f836f70c70a31a40d5
|
Provenance
The following attestation bundles were made for warden_interp-0.1.0.tar.gz:
Publisher:
release.yml on ghassenov/warden
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
warden_interp-0.1.0.tar.gz -
Subject digest:
db3b7a0e991226db65ac2efe02c9af38cf342c54a4392f9d7bae4f317b592718 - Sigstore transparency entry: 2035609609
- Sigstore integration time:
-
Permalink:
ghassenov/warden@90ed6c0da3c0f8b2f6a1a073caa62e032179b1d3 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/ghassenov
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@90ed6c0da3c0f8b2f6a1a073caa62e032179b1d3 -
Trigger Event:
push
-
Statement type:
File details
Details for the file warden_interp-0.1.0-cp38-abi3-win_amd64.whl.
File metadata
- Download URL: warden_interp-0.1.0-cp38-abi3-win_amd64.whl
- Upload date:
- Size: 304.0 kB
- Tags: CPython 3.8+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f115efcf53e4bc5be4565ce5e05d580adc9381641566299bc0f2990c030a07c
|
|
| MD5 |
d0dcede771f73127492e210870626a75
|
|
| BLAKE2b-256 |
6ef47b2870c80c14176eaab3dcc9746c7d9259daad6bf061fccd8edf5a5f1df7
|
Provenance
The following attestation bundles were made for warden_interp-0.1.0-cp38-abi3-win_amd64.whl:
Publisher:
release.yml on ghassenov/warden
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
warden_interp-0.1.0-cp38-abi3-win_amd64.whl -
Subject digest:
9f115efcf53e4bc5be4565ce5e05d580adc9381641566299bc0f2990c030a07c - Sigstore transparency entry: 2035609903
- Sigstore integration time:
-
Permalink:
ghassenov/warden@90ed6c0da3c0f8b2f6a1a073caa62e032179b1d3 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/ghassenov
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@90ed6c0da3c0f8b2f6a1a073caa62e032179b1d3 -
Trigger Event:
push
-
Statement type:
File details
Details for the file warden_interp-0.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: warden_interp-0.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 431.7 kB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b4d40bbc291c37ec1bf767a70de2f3848efdb390657bea87c01c22ab845beb2
|
|
| MD5 |
d9b2e2a5ef8b0c87449673ea26a456be
|
|
| BLAKE2b-256 |
132696ef532523745f1c337c050c977bb83c4b085eaaf113e62e861e34918a9c
|
Provenance
The following attestation bundles were made for warden_interp-0.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
release.yml on ghassenov/warden
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
warden_interp-0.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
7b4d40bbc291c37ec1bf767a70de2f3848efdb390657bea87c01c22ab845beb2 - Sigstore transparency entry: 2035610339
- Sigstore integration time:
-
Permalink:
ghassenov/warden@90ed6c0da3c0f8b2f6a1a073caa62e032179b1d3 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/ghassenov
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@90ed6c0da3c0f8b2f6a1a073caa62e032179b1d3 -
Trigger Event:
push
-
Statement type:
File details
Details for the file warden_interp-0.1.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: warden_interp-0.1.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 410.2 kB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7fc380e93e6b26a08c9ba94cb1ef43d7f92b9eebd8c09f110fb82f9bc8ab1928
|
|
| MD5 |
764902ba5778336151f3b39fa44643ec
|
|
| BLAKE2b-256 |
cfd98e3eb82442affcd302b349c72f4927096b908dfe4cee4a08ca986cd1134a
|
Provenance
The following attestation bundles were made for warden_interp-0.1.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
release.yml on ghassenov/warden
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
warden_interp-0.1.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -
Subject digest:
7fc380e93e6b26a08c9ba94cb1ef43d7f92b9eebd8c09f110fb82f9bc8ab1928 - Sigstore transparency entry: 2035611179
- Sigstore integration time:
-
Permalink:
ghassenov/warden@90ed6c0da3c0f8b2f6a1a073caa62e032179b1d3 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/ghassenov
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@90ed6c0da3c0f8b2f6a1a073caa62e032179b1d3 -
Trigger Event:
push
-
Statement type:
File details
Details for the file warden_interp-0.1.0-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.
File metadata
- Download URL: warden_interp-0.1.0-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
- Upload date:
- Size: 747.5 kB
- Tags: CPython 3.8+, macOS 10.12+ universal2 (ARM64, x86-64), macOS 10.12+ x86-64, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb3a854101672d75d1011d55239f2d5ccbc4a1ee90b5d28512b7cf192be69acd
|
|
| MD5 |
5796291647f287dc06b85c6e989d7476
|
|
| BLAKE2b-256 |
6b2fe1169fcefdc058a6647efc756a652b25867f1c8f57f3aaf58fbdec0abfef
|
Provenance
The following attestation bundles were made for warden_interp-0.1.0-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:
Publisher:
release.yml on ghassenov/warden
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
warden_interp-0.1.0-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl -
Subject digest:
cb3a854101672d75d1011d55239f2d5ccbc4a1ee90b5d28512b7cf192be69acd - Sigstore transparency entry: 2035610676
- Sigstore integration time:
-
Permalink:
ghassenov/warden@90ed6c0da3c0f8b2f6a1a073caa62e032179b1d3 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/ghassenov
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@90ed6c0da3c0f8b2f6a1a073caa62e032179b1d3 -
Trigger Event:
push
-
Statement type: