Spec-driven development framework for AI coding agents
Project description
scafld
scafld builds long-running AI coding work under adversarial review, so your agent stays coherent across the whole job.
Canonical repo: https://github.com/nilstate/scafld. Default branch: main.
Most AI coding tools optimize for speed inside one turn. scafld optimizes for correctness across the whole run:
- the work starts from a reviewed spec
- execution stays phase-bounded and measurable
- completion is gated by an independent challenger at review
The differentiator is simple: the agent does not get to grade its own homework.
Identity
scafld is a scaffold in the literal sense: temporary structure that shapes the build while the work is in progress.
The lifecycle stays familiar:
draft -> harden -> approve -> build -> review -> complete
What changed is the core model underneath it.
Primitives
Two nouns carry the system:
spec: what must be true. The reviewed contract and lifecycle source of truth.session: what happened. The durable run ledger under.ai/runs/{task-id}/session.json.
handoff is transport, not a primitive. It is the structured brief for the next voice.
Every generated handoff is:
- immutable
- sibling
*.md + *.json - tagged by
role × gate
Current runtime pairings:
executor × phaseexecutor × recoverychallenger × review
Review Gate
Challenge fires at one gate only in v1: review.
That keeps the system sharp without turning every phase into ceremony.
reviewemits a challenger handoff- the challenger writes a verdict into
.ai/reviews/{task-id}.md completecloses only if the review gate passes, or a human applies the audited override path after a completed challenger round
The override path is explicit:
scafld complete <task-id> --human-reviewed --reason "manual audit"
That override is available only after a completed challenger review round exists.
Runtime Layout
.ai/
specs/{drafts,approved,active,archive}/
runs/
{task-id}/
handoffs/
executor-phase-phase1.md
executor-phase-phase1.json
executor-recovery-ac1_1-1.md
executor-recovery-ac1_1-1.json
challenger-review.md
challenger-review.json
diagnostics/
ac1_1-attempt1.txt
session.json
archive/{YYYY-MM}/{task-id}/
Hard rules:
specnever carries runtime statehandoffis never read back to compute statesessionis the only durable run-state source- recovery is a handoff gate plus counters in
session, not a subsystem - telemetry is a view of
session, not a separate artifact - v1 makes zero spec schema changes
Agent Surface
Default help teaches the slim workflow surface, including repo seeding:
scafld init
scafld plan <task-id>
scafld approve <task-id>
scafld build <task-id>
scafld review <task-id>
scafld complete <task-id>
scafld status <task-id>
scafld list
scafld report
scafld handoff <task-id>
scafld update
Use scafld --help --advanced to show the remaining operator tools such as
harden, validate, branch, sync, audit, diff, summary, checks,
and pr-body.
Wrapper intent:
plan: create a draft spec or reopen harden on an existing draftbuild: start approved work and immediately drive validation to the next handoff or blockreview: run the adversarial review gate and emit the challenger handoffstatus: expose the canonicalnext_actionandcurrent_handoff
When the workspace includes them, the wrapper scripts make handoff consumption the default path for Codex and Claude Code instead of a manual convention.
scripts/scafld-codex-build.sh <task-id>
scripts/scafld-codex-review.sh <task-id>
scripts/scafld-claude-build.sh <task-id>
scripts/scafld-claude-review.sh <task-id>
Success Metrics
scafld claims quality lift only where it can measure it.
The canonical metrics are:
first_attempt_pass_raterecovery_convergence_ratechallenge_override_rate
report surfaces all three per task and in aggregate.
Use scafld report --runtime-only to focus on tasks with runtime session data.
It also surfaces review-signal counts such as completed challenger rounds, grounded findings, and clean reviews that still record what was attacked.
There is also one honest boundary: scafld can emit a better handoff, but an external harness may still ignore it. That is why the metrics are framed as session outcomes, not proof of prompt consumption.
Install
pip install scafld
npm install -g scafld
git clone https://github.com/nilstate/scafld.git ~/.scafld && ~/.scafld/install.sh
curl -fsSL https://raw.githubusercontent.com/nilstate/scafld/main/install.sh | sh
pip install scafld installs the console entry point plus the managed runtime bundle used by scafld init and scafld update.
npm install -g scafld installs the same CLI package for environments that ship tooling through npm. The CLI still requires python3 at runtime. Commands that edit YAML specs, such as scafld harden, also require PyYAML in that Python runtime:
python3 -m pip install PyYAML
Setup
cd your-project
scafld init
This creates the managed runtime bundle, prompts, schemas, run directories, and project-owned overlays:
your-project/
.ai/
scafld/ # Managed reset copy refreshed by `scafld update`
config.yaml # Project config overlay
config.local.yaml # Local machine overrides
prompts/ # Active project-owned template sources
runs/ # Generated handoffs, diagnostics, session state
reviews/ # Adversarial review artifacts
specs/ # Specs by lifecycle state
AGENTS.md
CLAUDE.md
CONVENTIONS.md
Start by customizing:
AGENTS.mdCLAUDE.mdCONVENTIONS.md.ai/config.local.yaml
When the workspace includes them, the handoff-first wrappers are:
scripts/scafld-codex-build.sh <task-id>scripts/scafld-codex-review.sh <task-id>scripts/scafld-claude-build.sh <task-id>scripts/scafld-claude-review.sh <task-id>
Repo-aware planning also works for mixed Python+Node repos. When both stacks are present, scafld merges the signals and prefers concrete detected commands over placeholder defaults.
Prompt ownership is deliberate:
.ai/prompts/*is the active template layer the runtime reads first.ai/scafld/prompts/*is the managed reset copy refreshed byscafld update
Minimal Runtime Config
llm:
model_profile: "default"
context:
budget_tokens: 12000
recovery:
max_attempts: 1
Anything beyond that waits until it earns its place through measured wins.
Next Docs
docs/execution.mddocs/integrations.mddocs/review.mddocs/run-artifacts.mddocs/cli-reference.mdAGENTS.md
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scafld-1.6.0.tar.gz.
File metadata
- Download URL: scafld-1.6.0.tar.gz
- Upload date:
- Size: 116.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bfabee370bffb8f1895953959fb59a5d4eb0cb11bd858255a6b68f04c8f849e3
|
|
| MD5 |
a9bf25d1ceee0c622b918b4a0b184b00
|
|
| BLAKE2b-256 |
b9b558d2b1ad9944d831ffdf921e19ecee8c35136c2793104a7a9b2dd67e368b
|
File details
Details for the file scafld-1.6.0-py3-none-any.whl.
File metadata
- Download URL: scafld-1.6.0-py3-none-any.whl
- Upload date:
- Size: 140.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd48d0d431284b2ac5d5fe402b1ec848df264374e238b8988b17daaaf116be1a
|
|
| MD5 |
a795196de36593981067412c4723c0f6
|
|
| BLAKE2b-256 |
ac3ebd6d1ae292c43b36e6ce1ab59d27ebe6729207a90f5285cbf9632ff5e326
|