Skip to main content

Diagnostic adversarial game for frontier LLMs — a policy-enforced kernel that mediates a Designer/Solver/Judge cycle, scores against a hidden oracle, and curates a Lab/Arena/Regression catalog.

Project description

日本語 | 中文 | Español | Français | हिन्दी | Italiano | Português (BR)

ai-crucible

CI MIT License Python 3.11–3.13 Coverage 96% Version 0.2.0 Handbook

A diagnostic adversarial game for frontier LLMs — a measurement instrument that happens to be fun.

One Claude session (Designer) crafts puzzles targeting real, currently-observed capability gaps. Another (Solver) attempts them. A policy-enforced kernel mediates, scores against a hidden oracle, and curates a catalog through a Lab → Arena → Regression lifecycle. Puzzles are grounded in empirical signal — real GitHub issues, academic literature, observed failures in the field — not synthetic.

What makes it different

  • Capability, not "cheating." AI Crucible distinguishes elegance and novelty (rewarded) from answer-bypass (penalized). Lateral thinking is a capability to measure, not a vice to punish.
  • The instrument measures itself. Prompt framing is a first-class measured arm — the kernel runs the same puzzle under neutral / self_referential / social_standings framings and reports its own prompt-effect as a diagnostic.
  • A sealed measurement boundary. Motivation and measurement never share a context window; the hidden oracle is graded out-of-band by a different model family with the agent's reasoning hidden. The model can't game what it can't perceive.
  • Reliability by consistency. pass^k (all k independent trials succeed), Wilson intervals, and cross-family judge panels — built to report distributions, not point estimates.
  • A living catalog. Solved puzzles are demoted to Regression, never deleted, so the catalog becomes a capability-evolution timeline as the frontier moves.

Threat model (summary)

AI Crucible's primary adversary is the model under measurement, which may try to reach the answer key or to detect and game the eval. It touches: puzzle prompts/setup, the Solver's own confined working directory, and the model-provider API calls the operator configures. It does not touch: the oracle/answer key (graded out-of-band on a separate host the Solver cannot reach) or motivation "chrome" (rank/standings — never injected into the scored context). Permissions: model-provider keys via environment variables at runtime; no bundled secrets, no telemetry, no outbound calls of its own. Full disclosure — including where a boundary is defense-in-depth rather than a hard guarantee — is in SECURITY.md.

Architecture

AI Crucible is a thin policy layer on Inspect AI (UK AISI), not a from-scratch harness. A single AttemptState object is threaded Designer → Solver → (Critic) → Judge through one generate choke point, so every model and tool call is observable.

Module Responsibility
puzzle_loader Loads a puzzle directory (meta.json / prompt / setup_script) into Solver-visible state. Never touches the oracle.
sandbox Narrow exec / read_file / write_file channel into a locked, network-less container.
roles The five role slots (Designer / Solver / Critic / Judge / CohortSolver). Only Solver gets tools; Critic is interface-reserved, default-off.
budget_governor Per-class tool-call + wall-clock budgets, displayed to the agent, enforced kernel-side; hard-kill on pathological loops.
oracle_scorer Out-of-band grading: solved-and-no-regression against the hidden oracle (SWE-bench pattern).
judge_panel Cross-family panel of model-scorers + reducer (PoLL) for novelty validation and bypass detection.
trace_writer Per-attempt transcript in the Inspect EvalLog shape; large blobs stored by digest.
observability Per-attempt → per-puzzle → per-model rollups; pass^k native.
attestation Cryptographic provenance (cosign + event-store) behind a typed subprocess boundary.

The sealed boundary runs in three tiers — Tier 1 scored context (deployment-shaped, framing-neutral), Tier 2 engagement framing (probed for contamination each release), Tier 3 chrome (rank/leaderboard — human-facing UI only, never in a context the model solves in). The full design rationale, with citations, is in docs/research-grounding.md.

Install

# As a Python library + CLI (PyPI):
pip install ai-crucible          # or: uv pip install ai-crucible
ai-crucible --help

# Or zero-prerequisite via npx — downloads a verified binary, no Python needed:
npx @dogfood-lab/ai-crucible --help

Research preview (v0.2.x). The judge panel's alt-test ω is still a circular model-jury bootstrap until a human-labeling round runs, so seated judges are provisional and the composed panel escalates to a Claude Designer below quorum. See the scorecard for the honest, non-cosmetic gate results.

Quick start (from source)

AI Crucible uses uv for environment and dependency management. Python 3.11+.

# Create the venv and install the dev + stats extras
uv sync --extra dev --extra stats

# Run the test suite (with the coverage gate)
uv run pytest --cov=ai_crucible --cov-report=term-missing

# Lint
uv run ruff check .

# One command: lint + tests + build + smoke
bash verify.sh

Documentation

License

MIT. Public and pre-1.0 — see the CHANGELOG for version status.


Built by MCP Tool Shop · part of the dogfood-lab workshop for testing in the AI era.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_crucible-0.2.0.tar.gz (3.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_crucible-0.2.0-py3-none-any.whl (212.4 kB view details)

Uploaded Python 3

File details

Details for the file ai_crucible-0.2.0.tar.gz.

File metadata

  • Download URL: ai_crucible-0.2.0.tar.gz
  • Upload date:
  • Size: 3.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai_crucible-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0794584abee4c47f8104a92555fab1e526c358af8b52204072d9f2e0ebd9aaf6
MD5 ba84e7bcd6378476be497e95fac58ffa
BLAKE2b-256 2669753d243aff4f5b56bc04d52f6cf8536f79c4625538927bc9eb1658c9d92f

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_crucible-0.2.0.tar.gz:

Publisher: release.yml on dogfood-lab/ai-crucible

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ai_crucible-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: ai_crucible-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 212.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai_crucible-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 edb4c67a5885fcf2c6b0f41ab1d53eb8a08720bb91d43cb7a123df8df5543d00
MD5 5738448e600633d66acd5f06b1972099
BLAKE2b-256 257b52ac6d003a7ef9ad2fe8d79a62372551ad20fa9e9ce8a829545b26951475

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_crucible-0.2.0-py3-none-any.whl:

Publisher: release.yml on dogfood-lab/ai-crucible

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page