Skip to main content

Structured automated feedback for code-generating agents — so they can work longer and more reliably without human intervention.

Project description

Crucis

Structured automated feedback for code-generating agents — so they can work longer and more reliably without human intervention.

Crucis is an autonomy scaffold. It replaces the human checkpoints that slow down AI-assisted coding — "does this work?", "did you handle edge cases?", "are you cheating?", "is the code clean?" — with automated, structured interventions that run in real time.

Each intervention maps to a specific human oversight role:

Human checkpoint Crucis intervention
"Does the code actually work?" Test-driven generation — agent iterates against a runnable test suite
"Does it generalize?" Holdout evals — hidden test cases verify beyond training examples
"Are the tests too easy to cheat?" Adversarial review + cheating probe
"Is the code well-written?" AST-based constraint checking (34 static analysis rules)
"Did you handle the edge cases I care about?" Behaviors — natural-language specs injected into prompts

The core idea: any test suite, even an imperfect one, gives an implementation agent a tighter feedback loop than no tests at all. Crucis automates the entire test-driven loop so the model can self-correct against something objective.

Quick start

uv pip install crucis
crucis init --name factorial --no-agent
# Edit objective.yaml: add examples, set description and signature
crucis run

That's it. Crucis generates tests, hardens them adversarially, writes an implementation, and verifies it against hidden holdout evals — all without human intervention.

How it works

objective.yaml ──► Generate tests ──► Adversarial review ──► Cheating probe ──► Implementation ──► Holdout verification
                        │                    │                     │                  │                     │
                   "write pytest"     "find weaknesses"     "try to cheat"    "pass all tests"     "pass hidden evals"
  1. Fit phase: An agent generates a pytest suite from your examples and constraints. A second agent attacks it, finding gaps. A cheating probe tries to exploit them. The cycle repeats until the tests are robust.
  2. Evaluate phase: An implementation agent writes code to pass the hardened tests. Hidden holdout evals verify it generalizes.

What you write

A single objective.yaml:

name: factorial
description: Return n! for non-negative n. Raise ValueError for negative input.
signature: factorial(n: int) -> int
examples:
  - input: "(0,)"
    output: "1"
  - input: "(5,)"
    output: "120"
  - input: "(10,)"
    output: "3628800"
behaviors:
  - "Raises ValueError for negative input"
target_files:
  - src/solution.py

Holdout evals are auto-split from your examples — no manual train/holdout separation needed. Constraint profiles are optional and loaded from built-in defaults if you don't provide them.

Install

uv pip install crucis        # recommended
pip install crucis            # also works

Requires Python 3.10+ (3.12+ recommended) and at least one agent CLI (claude or codex) on your PATH.

Documentation

Full docs — quickstarts, reference, configuration, troubleshooting.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crucis-0.1.0.tar.gz (392.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crucis-0.1.0-py3-none-any.whl (196.0 kB view details)

Uploaded Python 3

File details

Details for the file crucis-0.1.0.tar.gz.

File metadata

  • Download URL: crucis-0.1.0.tar.gz
  • Upload date:
  • Size: 392.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for crucis-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ad3839eb439341bfdc6a9ed9f578d8f4bb29a02e6a5eeca9369a188d0b102371
MD5 fe73cf245ce526ab1672880aa78b0595
BLAKE2b-256 92b6df49c196480931c76a8f9a62bfcad47545e277b58810322a637f9c80bc74

See more details on using hashes here.

Provenance

The following attestation bundles were made for crucis-0.1.0.tar.gz:

Publisher: publish.yml on gilad12-coder/crucis

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file crucis-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: crucis-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 196.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for crucis-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e10666f765952e961548bbf6cfd336073b0a5147754c8ca2ef39228221f1b533
MD5 dd607548d63ded30ab152a362925b5f3
BLAKE2b-256 fac52119705d1099d76729c2a9427f53cb189c723b6c33b359927964aa4697e2

See more details on using hashes here.

Provenance

The following attestation bundles were made for crucis-0.1.0-py3-none-any.whl:

Publisher: publish.yml on gilad12-coder/crucis

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page