Diagnostic adversarial game for frontier LLMs — a policy-enforced kernel that mediates a Designer/Solver/Judge cycle, scores against a hidden oracle, and curates a Lab/Arena/Regression catalog.
Project description
日本語 | 中文 | Español | Français | हिन्दी | Italiano | Português (BR)
A diagnostic adversarial game for frontier LLMs — a measurement instrument that happens to be fun.
One Claude session (Designer) crafts puzzles targeting real, currently-observed capability gaps. Another (Solver) attempts them. A policy-enforced kernel mediates, scores against a hidden oracle, and curates a catalog through a Lab → Arena → Regression lifecycle. Puzzles are grounded in empirical signal — real GitHub issues, academic literature, observed failures in the field — not synthetic.
What makes it different
- Capability, not "cheating." AI Crucible distinguishes elegance and novelty (rewarded) from answer-bypass (penalized). Lateral thinking is a capability to measure, not a vice to punish.
- The instrument measures itself. Prompt framing is a first-class measured arm — the kernel runs the same puzzle under
neutral/self_referential/social_standingsframings and reports its own prompt-effect as a diagnostic. - A sealed measurement boundary. Motivation and measurement never share a context window; the hidden oracle is graded out-of-band by a different model family with the agent's reasoning hidden. The model can't game what it can't perceive.
- Reliability by consistency.
pass^k(all k independent trials succeed), Wilson intervals, and cross-family judge panels — built to report distributions, not point estimates. - A living catalog. Solved puzzles are demoted to
Regression, never deleted, so the catalog becomes a capability-evolution timeline as the frontier moves.
Threat model (summary)
AI Crucible's primary adversary is the model under measurement, which may try to reach the answer key or to detect and game the eval. It touches: puzzle prompts/setup, the Solver's own confined working directory, and the model-provider API calls the operator configures. It does not touch: the oracle/answer key (graded out-of-band on a separate host the Solver cannot reach) or motivation "chrome" (rank/standings — never injected into the scored context). Permissions: model-provider keys via environment variables at runtime; no bundled secrets, no telemetry, no outbound calls of its own. Full disclosure — including where a boundary is defense-in-depth rather than a hard guarantee — is in SECURITY.md.
Architecture
AI Crucible is a thin policy layer on Inspect AI (UK AISI), not a from-scratch harness. A single AttemptState object is threaded Designer → Solver → (Critic) → Judge through one generate choke point, so every model and tool call is observable.
| Module | Responsibility |
|---|---|
puzzle_loader |
Loads a puzzle directory (meta.json / prompt / setup_script) into Solver-visible state. Never touches the oracle. |
sandbox |
Narrow exec / read_file / write_file channel into a locked, network-less container. |
roles |
The five role slots (Designer / Solver / Critic / Judge / CohortSolver). Only Solver gets tools; Critic is interface-reserved, default-off. |
budget_governor |
Per-class tool-call + wall-clock budgets, displayed to the agent, enforced kernel-side; hard-kill on pathological loops. |
oracle_scorer |
Out-of-band grading: solved-and-no-regression against the hidden oracle (SWE-bench pattern). |
judge_panel |
Cross-family panel of model-scorers + reducer (PoLL) for novelty validation and bypass detection. |
trace_writer |
Per-attempt transcript in the Inspect EvalLog shape; large blobs stored by digest. |
observability |
Per-attempt → per-puzzle → per-model rollups; pass^k native. |
attestation |
Cryptographic provenance (cosign + event-store) behind a typed subprocess boundary. |
The sealed boundary runs in three tiers — Tier 1 scored context (deployment-shaped, framing-neutral), Tier 2 engagement framing (probed for contamination each release), Tier 3 chrome (rank/leaderboard — human-facing UI only, never in a context the model solves in). The full design rationale, with citations, is in docs/research-grounding.md.
Install
# As a Python library + CLI (PyPI):
pip install ai-crucible # or: uv pip install ai-crucible
ai-crucible --help
# Or zero-prerequisite via npx — downloads a verified binary, no Python needed:
npx @dogfood-lab/ai-crucible --help
Research preview (v0.2.x). The judge panel's alt-test ω is still a circular model-jury bootstrap until a human-labeling round runs, so seated judges are provisional and the composed panel escalates to a Claude Designer below quorum. See the scorecard for the honest, non-cosmetic gate results.
Quick start (from source)
AI Crucible uses uv for environment and dependency management. Python 3.11+.
# Create the venv and install the dev + stats extras
uv sync --extra dev --extra stats
# Run the test suite (with the coverage gate)
uv run pytest --cov=ai_crucible --cov-report=term-missing
# Lint
uv run ruff check .
# One command: lint + tests + build + smoke
bash verify.sh
Documentation
- Handbook — guides, architecture, and reference.
docs/research-grounding.md— design rationale, with citations.docs/gameplan.md— roadmap and open questions.SECURITY.md— threat model + honest residual-risk disclosure.
License
MIT. Public and pre-1.0 — see the CHANGELOG for version status.
Built by MCP Tool Shop · part of the dogfood-lab workshop for testing in the AI era.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ai_crucible-0.2.0.tar.gz.
File metadata
- Download URL: ai_crucible-0.2.0.tar.gz
- Upload date:
- Size: 3.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0794584abee4c47f8104a92555fab1e526c358af8b52204072d9f2e0ebd9aaf6
|
|
| MD5 |
ba84e7bcd6378476be497e95fac58ffa
|
|
| BLAKE2b-256 |
2669753d243aff4f5b56bc04d52f6cf8536f79c4625538927bc9eb1658c9d92f
|
Provenance
The following attestation bundles were made for ai_crucible-0.2.0.tar.gz:
Publisher:
release.yml on dogfood-lab/ai-crucible
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ai_crucible-0.2.0.tar.gz -
Subject digest:
0794584abee4c47f8104a92555fab1e526c358af8b52204072d9f2e0ebd9aaf6 - Sigstore transparency entry: 1702203060
- Sigstore integration time:
-
Permalink:
dogfood-lab/ai-crucible@453e8a7bf1ec478f82fef2e79788dfe3c02c17d7 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/dogfood-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@453e8a7bf1ec478f82fef2e79788dfe3c02c17d7 -
Trigger Event:
release
-
Statement type:
File details
Details for the file ai_crucible-0.2.0-py3-none-any.whl.
File metadata
- Download URL: ai_crucible-0.2.0-py3-none-any.whl
- Upload date:
- Size: 212.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
edb4c67a5885fcf2c6b0f41ab1d53eb8a08720bb91d43cb7a123df8df5543d00
|
|
| MD5 |
5738448e600633d66acd5f06b1972099
|
|
| BLAKE2b-256 |
257b52ac6d003a7ef9ad2fe8d79a62372551ad20fa9e9ce8a829545b26951475
|
Provenance
The following attestation bundles were made for ai_crucible-0.2.0-py3-none-any.whl:
Publisher:
release.yml on dogfood-lab/ai-crucible
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ai_crucible-0.2.0-py3-none-any.whl -
Subject digest:
edb4c67a5885fcf2c6b0f41ab1d53eb8a08720bb91d43cb7a123df8df5543d00 - Sigstore transparency entry: 1702203114
- Sigstore integration time:
-
Permalink:
dogfood-lab/ai-crucible@453e8a7bf1ec478f82fef2e79788dfe3c02c17d7 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/dogfood-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@453e8a7bf1ec478f82fef2e79788dfe3c02c17d7 -
Trigger Event:
release
-
Statement type: