Skip to main content

MI-Chef: an audit suite for mechanistic interpretability - testing attribution graph faithfulness by serving corpora to cross-layer transcoders. v0.0.1 is an early-development release for an active research project; the audit suite ships with the paper.

Project description

MI-Chef

An audit suite for mechanistic interpretability: testing attribution graph faithfulness by serving corpora to cross-layer transcoders.

michef v0.0.1 is an early-development release reserving the package name for an active research project. The full audit suite ships alongside the accompanying paper. If you landed here early: the API below is the roadmap, not yet the product.

Why

Attribution graphs — the causal stories produced by circuit tracing — are never computed on a model directly. They are computed through a replacement model (a cross-layer transcoder, CLT) trained on a corpus the researcher chooses. Whether the choice of corpus changes the story has never been measured. MI-Chef measures it, and packages the instruments so you can audit your own interpreters before trusting their testimony.

Roadmap (ships with the paper)

  • michef.audit — the product: circuit stability score, four-level agreement metrics (feature / subspace / graph / narrative), seed and paraphrase noise floors, Procrustes gauge controls, anti-phantom validation battery.
  • michef.pantry — loaders for the corpus-controlled CLT grid (HuggingFace).
  • michef.serve — corpus-to-CLT recipes (thin wrapper over CLT-Forge; consumes, never reimplements).
  • michef.taste — side-by-side attribution-graph comparison dashboards.

Integrates with circuit-tracer and CLT-Forge checkpoints.

Status

Active research project targeting v0.1 with the paper release. Watch this space.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

michef-0.0.1.tar.gz (2.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

michef-0.0.1-py3-none-any.whl (2.7 kB view details)

Uploaded Python 3

File details

Details for the file michef-0.0.1.tar.gz.

File metadata

  • Download URL: michef-0.0.1.tar.gz
  • Upload date:
  • Size: 2.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for michef-0.0.1.tar.gz
Algorithm Hash digest
SHA256 09b526f7347cb413e73dcd14a51192e0b472ac6ceff8fb7f90c5a166d46a3e27
MD5 c6c0229d53cb78f66d054dc2937c35b9
BLAKE2b-256 3b3869ec63a7814f28d43a01f10199fe092e091bf1a7a0a29218d02ee531cb6b

See more details on using hashes here.

File details

Details for the file michef-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: michef-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 2.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for michef-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 09650dc4c9282e08093a796bc1302790e26e3232ea6c25cf4cef78443f742e2d
MD5 4868207dbd8fc63685c676c6a76de51f
BLAKE2b-256 b2bdebc1aa2c69de7934088ec0f1d548eb95b99f15ccc61a42f0af488ae4732d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page