Skip to main content

Spec-driven development framework for AI coding agents

Project description

scafld

Review Gate Smoke

An opinionated orchestration layer for AI coding agents.

Canonical repo: https://github.com/nilstate/scafld. Default branch: main.

Most AI coding tools let agents jump straight into your codebase and start writing. No plan. No review. No audit trail. Just vibes and a prayer.

The result is predictable: code that looks right, passes the tests, and slowly rots from the inside. Duplicated blocks. Architectural drift. Changes nobody asked for buried in changes somebody did. The agent ships fast and you spend the next week figuring out what it actually did.

scafld enforces a simple constraint: think before you type.

Every non-trivial task becomes a YAML specification before a single line of code changes. The spec defines what will change, in what order, with what acceptance criteria, and how to roll it back if it breaks. A human reviews and approves the spec. Only then does the agent execute - phase by phase, validated at every checkpoint, auditable after the fact.

This isn't a wrapper around a prompt. It's a development methodology - the same separation of planning from execution that every serious engineering discipline has always required, applied to the one context where people have decided to skip it entirely.

The goal is for scafld to feel like the engineering system itself, not an extra system bolted on beside Git, pull requests, issues, and CI. That is why the workflow object stays in the spec while commands such as summary, checks, and pr-body project the same truth onto normal engineering surfaces.

If an outer system keeps its own receipts, journals, or pushed outputs, that state should stay outside scafld. scafld owns the repo-local workflow object and emits projections; wrappers can publish or record those projections without mirroring wrapper-managed state back into the spec.

User Request
    |
    v
 PLAN MODE          AI explores codebase, generates spec
 (read-only)        .ai/specs/drafts/{task-id}.yaml
    |
    v
 HARDEN             Grounded interrogation of the draft
 (optional)         scafld harden <task-id>
    |
    v
 Human Review       Developer reviews and approves
    |
    v
 EXEC MODE          AI executes spec phase-by-phase
 (autonomous)       with validation at every checkpoint
    |
    v
 REVIEW             Adversarial self-review finds what
                    execution missed (ideally fresh session)
    |
    v
 Archive            Completed spec + audit trail

Why This Exists

We built scafld because every AI coding workflow we used was broken in the same way. The agent would receive a task, immediately start modifying files, and produce something that was technically functional but architecturally thoughtless. Ask it to add a feature and it might refactor three other things along the way. Ask it to fix a bug and it might introduce a dependency you didn't want. There was no contract between what was requested and what was delivered, and no way to verify the difference after the fact.

The spec is the contract. It forces the planning to happen explicitly, in a format that a human can review and a machine can validate. It creates an audit trail that answers "what changed, why, and did it match what was agreed." It makes AI-assisted development reproducible instead of hopeful.

Install

pip install scafld
npm install -g scafld
git clone https://github.com/nilstate/scafld.git ~/.scafld && ~/.scafld/install.sh
curl -fsSL https://raw.githubusercontent.com/nilstate/scafld/main/install.sh | sh

pip install scafld installs the console entry point plus the runtime bundle used by scafld init and scafld update.

npm install -g scafld installs the same CLI package for environments that distribute tooling through npm. The CLI still requires python3 at runtime because the executable itself is Python. Commands that edit YAML specs, such as scafld harden, also need PyYAML available in that Python runtime:

python3 -m pip install PyYAML

The git install clones scafld to ~/.scafld and symlinks the scafld command to ~/.local/bin/.

To update the installed checkout: scafld update --self

To refresh the managed bundle in the current workspace: scafld update

To refresh every scafld workspace under a development tree: scafld update --scan-root ~/dev

Release

The canonical release version lives in scafld/_version.py. Everything else derives from that.

python3 scripts/bump_version.py X.Y.Z
python3 scripts/sync_version.py --check
git commit -am "Release vX.Y.Z"
git tag vX.Y.Z
git push origin main vX.Y.Z

Pushing the tag triggers the release workflow, which validates the tree, builds both packages, publishes to PyPI and npm, and creates the GitHub release.

Setup

cd your-project
scafld init

This scaffolds the full structure into your project:

your-project/
  .ai/
    scafld/                # Managed runtime bundle refreshed by `scafld update`
    config.yaml            # Validation rules, rubric, safety controls
    config.local.yaml      # Your overrides (build/test/lint commands)
    prompts/               # Plan + exec mode instructions
    schemas/               # Spec validation schema
    specs/
      drafts/              # Planning in progress
      approved/            # Ready for execution
      active/              # Currently executing
      archive/             # Completed work
    logs/                  # Execution logs (gitignored)
  AGENTS.md                # Your project's invariants and policies
  CLAUDE.md                # Project overview, essential commands
  CONVENTIONS.md           # Tech stack, patterns, coding standards

Make It Yours

  1. AGENTS.md - Your architectural invariants, domain rules, forbidden actions
  2. CONVENTIONS.md - Your tech stack, naming conventions, testing patterns
  3. CLAUDE.md - Project overview, essential commands, agent-specific tips
  4. .ai/config.local.yaml - Your build/test/lint commands (merges on top of config.yaml) scafld init now pre-populates this file from common Node and Python repo markers when it can, then falls back to safe placeholders when it cannot.

Project Structure

scafld is opinionated about how your project should be organised, because the structure is what gives the AI agent visibility over your entire codebase.

Single Repo

For a single codebase, just run scafld init at the root. The agent sees everything.

Multi-Repo Workspace

For projects with multiple codebases - an API, a frontend, an SDK, an MCP server - the workspace pattern gives the agent visibility across all of them from a single root.

Create a root repo that acts as the orchestration layer. Add your codebases as git submodules underneath. Run scafld init at the root. Now the agent can see your specs, your conventions, your architectural invariants, AND all your code - in one context.

mkdir my-project && cd my-project
git init
git submodule add git@github.com:org/api.git api
git submodule add git@github.com:org/app.git app
git submodule add git@github.com:org/sdk.git sdk
scafld init
my-project/                # Root workspace repo
  .ai/                     # scafld config and specs
  AGENTS.md                # Cross-project invariants
  CLAUDE.md                # Agent overview of the whole system
  CONVENTIONS.md           # Shared coding standards
  api/                     # Submodule: your API
  app/                     # Submodule: your frontend
  sdk/                     # Submodule: your SDK

The root repo is lightweight - it holds the orchestration layer (scafld files, agent docs) and pointers to the real code. Each submodule is still its own repo with its own history. But the agent sees the whole picture from the root, which means it can plan changes that span multiple codebases and understand how they connect.

This is how we work. It's not the only way, but if you're running AI agents across multiple repos without a unified root, you're asking the agent to plan with half the context.

CLI

scafld new <task-id> [-t title] [-s size] [-r risk]     # Scaffold a new spec
scafld list [filter]                                    # List all specs
scafld status <task-id> [--json]                        # Show spec details
scafld validate <task-id> [--json]                      # Validate against schema
scafld branch <task-id> [--name branch]                 # Bind the task to a working branch
scafld sync <task-id> [--json]                          # Compare the bound branch to live git state
scafld summary <task-id> [--json]                       # Render a concise engineering summary
scafld checks <task-id> [--json]                        # Render CI-friendly check status/details
scafld pr-body <task-id> [--json]                       # Render a deterministic PR body
scafld harden <task-id> [--mark-passed]                 # Optional: interrogate draft with grounded questions
scafld approve <task-id>                                # Validate + move to approved
scafld start <task-id>                                  # Move to active
scafld exec <task-id> [-p phase] [-r]                    # Run acceptance criteria (-r = resume)
scafld audit <task-id> [-b base-ref]                    # Spec vs current working tree (or a base ref with -b)
scafld diff <task-id>                                   # Git history for a spec
scafld review <task-id> [--json]                        # Run configured automated passes + scaffold Review Artifact v3
scafld complete <task-id> [--json]                       # Read review, record verdict, archive (requires passing review)
scafld complete <task-id> --human-reviewed --reason "manual audit"
                                                          # Exceptional audited override when the gate is blocked
scafld fail <task-id>                                   # Archive as failed
scafld cancel <task-id>                                 # Archive as cancelled
scafld report                                           # Aggregate stats
scafld update [--scan-root PATH] [--self]               # Refresh the managed framework bundle

Managed Bundle

Each workspace now carries a framework-managed runtime bundle under .ai/scafld/.

  • .ai/scafld/config.yaml provides the current scafld defaults
  • .ai/config.yaml remains the repo's project-level config/overlay
  • .ai/config.local.yaml remains the local machine override layer
  • .ai/scafld/manifest.json records the scafld version, source commit, and bundle file hashes

scafld update refreshes .ai/scafld/ without overwriting repo-owned docs or project-specific config.

CLI Integrity

The CLI now routes workspace discovery, spec lifecycle moves, and output rendering through shared internal modules instead of per-command copies of the same logic.

  • workspace discovery and scan-root traversal use one runtime surface
  • spec lookup, lifecycle transitions, archive moves, and planning-log appends use one spec-store surface
  • human-facing errors and machine-facing output use one output surface

That split is not internal ceremony. It is what keeps commands like status, approve, start, complete, and update aligned as scafld grows more machine-facing.

Per-Criterion Working Directory

In monorepo/workspace setups, different acceptance criteria may target different submodules. Use the optional cwd field to set the working directory for a command, relative to the workspace root:

acceptance_criteria:
  - id: ac1
    type: test
    cwd: api
    command: "bundle exec rspec spec/services/"
    expected: "0 failures"
  - id: ac2
    type: test
    cwd: app
    command: "yarn test"
    expected: "0 failures"

Commands without cwd run from the workspace root. The path must be relative and must resolve within the workspace — paths that escape the root are rejected.

You can also set a spec-level default under task.context.cwd so you don't repeat it on every criterion:

task:
  context:
    cwd: api
    packages:
      - app/services

Individual criteria can still override with their own cwd.

Per-Criterion Timeout

Acceptance criteria default to a 600 second timeout. Long-running checks can override that with timeout_seconds:

acceptance_criteria:
  - id: ac3
    type: test
    cwd: api
    command: "bundle exec rspec"
    expected: "0 failures"
    timeout_seconds: 900

Use specific expectations like 0 failures or exit code 0 when possible. Generic phrases like All pass are accepted, but the explicit forms are easier for scafld to verify and for humans to audit.

Usage

Tell your AI agent: "Let's plan [feature]. Create a task spec."

The agent enters read-only planning mode, explores your codebase, and produces a YAML spec with objectives, phases, acceptance criteria, and rollback commands. You review it, approve it, and the agent executes autonomously within those bounds.

What It Actually Does

  • Spec-driven - Every task is a versioned, schema-validated YAML artifact. Not a prompt. Not a conversation. A machine-readable contract.
  • Harden (optional) - Stress-test a draft before approval. Ask one question at a time, inspect code before asking when the repo already holds the answer, record why each question exists via grounded_in, and stop instead of padding the loop.
  • Approval gate - No code changes until a human reviews the plan. The agent thinks; you decide.
  • Phase-by-phase execution - Acceptance criteria at every checkpoint, not just at the end.
  • Scope audit - scafld audit compares what the spec declared against the current workspace change set. Undeclared changes get flagged while scafld's own execution artifacts stay out of the way.
  • Adversarial review - Before archiving, scafld review runs the configured spec_compliance and scope_drift passes, scaffolds Review Artifact v3, and prepares the adversarial regression_hunt, convention_check, and dark_patterns sections. scafld complete requires a structurally valid latest review or an exceptional human-reviewed override with an audited reason.
  • Self-evaluation - Agents score their own work against a configurable rubric. Below 7/10 triggers a second pass.
  • Rollback commands - Per-phase rollback for safe failure recovery. Every phase declares how to undo itself.
  • Resume protocol - Interrupted executions pick up where they left off.
  • Validation profiles - Light, standard, or strict, configured per-task or derived from risk level.
  • Reporting - scafld report aggregates pass rates, self-eval scores, and scope drift across your entire spec history.
  • Internally coherent - Shared runtime, spec-store, and output layers keep lifecycle commands on the same workspace-discovery, transition, and error rules instead of drifting command-by-command.
  • Agent-agnostic - Works with Claude, Cursor, Copilot, Windsurf, or any AI coding agent.

Review Pipeline

The default review model is a five-pass pipeline declared in .ai/config.yaml:

  • spec_compliance
  • scope_drift
  • regression_hunt
  • convention_check
  • dark_patterns

Pass ordering is explicit through per-pass order fields, so the review pipeline does not depend on YAML mapping order. scafld review scaffolds Review Artifact v3 with per-pass pass_results and round_status: "in_progress". The reviewer fills the configured adversarial sections, updates the metadata to round_status: "completed", and sets final pass results before scafld complete archives the spec.

Trust Boundary

scafld now enforces a materially stronger local review workflow, but local CLI checks are still not the whole trust boundary.

For best-in-class review governance, add the next layer outside the agent session:

  • CI or merge gate validates the latest review artifact before code lands
  • Diff or commit binding ties the review artifact to the exact reviewed diff or commit
  • External reviewer driver runs the adversarial review from a configurable tool or service instead of trusting the executor path alone
  • Out-of-band approval moves human override out of the terminal session and into a separate approval surface

Documentation

File Audience Purpose
AGENTS.md AI agents Invariants, modes, validation, conventions
CLAUDE.md Claude Code Claude-specific tool tips
CONVENTIONS.md AI agents Coding standards template
.ai/config.yaml Both All configuration in one place
.ai/OPERATORS.md Developers Human cheat sheet for working with specs

License

MIT

Contributing

Contributions welcome. Follow the spec-driven approach - practice what we preach.


Built by Sourcey. We build AI infrastructure that works in production, not in pitch decks.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scafld-1.5.1.tar.gz (105.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scafld-1.5.1-py3-none-any.whl (123.3 kB view details)

Uploaded Python 3

File details

Details for the file scafld-1.5.1.tar.gz.

File metadata

  • Download URL: scafld-1.5.1.tar.gz
  • Upload date:
  • Size: 105.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scafld-1.5.1.tar.gz
Algorithm Hash digest
SHA256 34d7b95895512bc1f50255be26fc4ded5c1c62693aa03953b71398fe81229860
MD5 37f39f9032d3842e0cceffc2085b9485
BLAKE2b-256 dfac35c3f06c640e28ee6ce69323ce5223647790a841131b08afd8e41e4594b6

See more details on using hashes here.

File details

Details for the file scafld-1.5.1-py3-none-any.whl.

File metadata

  • Download URL: scafld-1.5.1-py3-none-any.whl
  • Upload date:
  • Size: 123.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scafld-1.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7838fbaf44a9530e63109be33a0f2208b3347fd72b2a10c336eb72cbef4683ab
MD5 5e940ff5827c3e5ca98f1d83405588a7
BLAKE2b-256 2b244182284864bc3205a00aeccc96b4e377af9980aa7a560a6adae6199ba38d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page