Structured automated feedback for code-generating agents — so they can work longer and more reliably without human intervention.
Project description
Crucis
Structured automated feedback for code-generating agents — so they can work longer and more reliably without human intervention.
Crucis is an autonomy scaffold. It replaces the human checkpoints that slow down AI-assisted coding — "does this work?", "did you handle edge cases?", "are you cheating?", "is the code clean?" — with automated, structured interventions that run in real time.
Each intervention maps to a specific human oversight role:
| Human checkpoint | Crucis intervention |
|---|---|
| "Does the code actually work?" | Test-driven generation — agent iterates against a runnable test suite |
| "Does it generalize?" | Holdout evals — hidden test cases verify beyond training examples |
| "Are the tests too easy to cheat?" | Adversarial review + cheating probe |
| "Is the code well-written?" | AST-based constraint checking (34 static analysis rules) |
| "Did you handle the edge cases I care about?" | Behaviors — natural-language specs injected into prompts |
The core idea: any test suite, even an imperfect one, gives an implementation agent a tighter feedback loop than no tests at all. Crucis automates the entire test-driven loop so the model can self-correct against something objective.
Quick start
uv pip install crucis
crucis init --name factorial --no-agent
# Edit objective.yaml: add examples, set description and signature
crucis run
That's it. Crucis generates tests, hardens them adversarially, writes an implementation, and verifies it against hidden holdout evals — all without human intervention.
How it works
objective.yaml ──► Generate tests ──► Adversarial review ──► Cheating probe ──► Implementation ──► Holdout verification
│ │ │ │ │
"write pytest" "find weaknesses" "try to cheat" "pass all tests" "pass hidden evals"
- Fit phase: An agent generates a pytest suite from your examples and constraints. A second agent attacks it, finding gaps. A cheating probe tries to exploit them. The cycle repeats until the tests are robust.
- Evaluate phase: An implementation agent writes code to pass the hardened tests. Hidden holdout evals verify it generalizes.
What you write
A single objective.yaml:
name: factorial
description: Return n! for non-negative n. Raise ValueError for negative input.
signature: factorial(n: int) -> int
examples:
- input: "(0,)"
output: "1"
- input: "(5,)"
output: "120"
- input: "(10,)"
output: "3628800"
behaviors:
- "Raises ValueError for negative input"
target_files:
- src/solution.py
Holdout evals are auto-split from your examples — no manual train/holdout separation needed. Constraint profiles are optional and loaded from built-in defaults if you don't provide them.
Install
uv pip install crucis # recommended
pip install crucis # also works
Requires Python 3.10+ (3.12+ recommended) and at least one agent CLI (claude or codex) on your PATH.
Documentation
Full docs — quickstarts, reference, configuration, troubleshooting.
- Start Here — prerequisites and orientation
- New Project Quickstart — build a verified function from scratch
- Existing Codebase Quickstart — add verification to your current project
- CLI Reference — all commands and options
- MCP Server — use Crucis as MCP tools from any AI agent
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crucis-0.1.0.tar.gz.
File metadata
- Download URL: crucis-0.1.0.tar.gz
- Upload date:
- Size: 392.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ad3839eb439341bfdc6a9ed9f578d8f4bb29a02e6a5eeca9369a188d0b102371
|
|
| MD5 |
fe73cf245ce526ab1672880aa78b0595
|
|
| BLAKE2b-256 |
92b6df49c196480931c76a8f9a62bfcad47545e277b58810322a637f9c80bc74
|
Provenance
The following attestation bundles were made for crucis-0.1.0.tar.gz:
Publisher:
publish.yml on gilad12-coder/crucis
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
crucis-0.1.0.tar.gz -
Subject digest:
ad3839eb439341bfdc6a9ed9f578d8f4bb29a02e6a5eeca9369a188d0b102371 - Sigstore transparency entry: 991695598
- Sigstore integration time:
-
Permalink:
gilad12-coder/crucis@7f8ed6ca04f4fd3a5f48c4c3d2883fea2989260e -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/gilad12-coder
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7f8ed6ca04f4fd3a5f48c4c3d2883fea2989260e -
Trigger Event:
push
-
Statement type:
File details
Details for the file crucis-0.1.0-py3-none-any.whl.
File metadata
- Download URL: crucis-0.1.0-py3-none-any.whl
- Upload date:
- Size: 196.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e10666f765952e961548bbf6cfd336073b0a5147754c8ca2ef39228221f1b533
|
|
| MD5 |
dd607548d63ded30ab152a362925b5f3
|
|
| BLAKE2b-256 |
fac52119705d1099d76729c2a9427f53cb189c723b6c33b359927964aa4697e2
|
Provenance
The following attestation bundles were made for crucis-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on gilad12-coder/crucis
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
crucis-0.1.0-py3-none-any.whl -
Subject digest:
e10666f765952e961548bbf6cfd336073b0a5147754c8ca2ef39228221f1b533 - Sigstore transparency entry: 991695604
- Sigstore integration time:
-
Permalink:
gilad12-coder/crucis@7f8ed6ca04f4fd3a5f48c4c3d2883fea2989260e -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/gilad12-coder
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7f8ed6ca04f4fd3a5f48c4c3d2883fea2989260e -
Trigger Event:
push
-
Statement type: