Skip to main content

Spec-driven workflow system for autonomous AI-assisted software development

Project description

SpecFlow

A platform-agnostic, spec-driven workflow system for autonomous AI-assisted software development.

SpecFlow separates the work that requires human judgment (specs, acceptance criteria, interface contracts) from the work an agent can do autonomously (implementation, validation, documentation). The result: agents that stay focused, don't gold-plate, and stop cleanly when they hit something the spec doesn't cover.


Core Idea

AI agent failures are workflow failures, not model failures. Agents jump to implementation, infer unstated requirements, scope-creep, and lose context between sessions because nothing stops them.

SpecFlow fixes this by encoding workflow discipline in files:

  • Human input is concentrated upfront — specs, acceptance criteria, interface contracts
  • Agents operate in isolated Build Mode sessions where the spec is frozen
  • The test suite decides when a session is done, not the agent
  • When the spec is insufficient, agents stop and log a Spec Gap instead of guessing

Two Modes

Design Mode — human-intensive, project-scoped

Phases: Plan → Specify → Scaffold

The goal is a fully specified repo. Every unit needs four things before Build Mode can start:

  1. Acceptance Criteria — binary pass/fail (Given/When/Then or input→output)
  2. Interface Contracts — typed public interface, frozen during Build Mode
  3. Test Infrastructure — framework, test location, and how to run
  4. Explicit Out-of-Scope — what the unit does NOT do

At each phase boundary, the agent stops and self-assesses before you approve.

Build Mode — agent-autonomous, unit-scoped

Phases: Implement → Validate → Document

The spec is frozen. The agent reads the unit spec, runs the tests, and implements until all acceptance criteria pass. No spec changes, no test changes, no scope creep. Session ends when tests pass.

Spec Gap — when a Build Mode agent finds the spec insufficient:

  1. Stop — don't guess
  2. Log a [SPEC GAP] entry in the unit log
  3. Set the unit status to gap, mode to design
  4. Return to Design Mode to resolve it

Quick Start

1. Install

pip install specflow-agent

2. Initialize a project

# From your project directory
specflow init "My Project"

This creates:

.specflow/
├── specflow.md          # Control plane (mode: design, unit registry)
├── todo.md              # Root task list
├── interfaces/          # Cross-unit interface contracts
└── units/
    └── docs/
        ├── spec.md      # Project-level spec
        └── todo.md
.claude/
└── commands/
    ├── sf-start.md # /sf-start slash command
    └── sf-end.md   # /sf-end slash command
CLAUDE.md                # Auto-generated mode-specific agent instructions

3. Start a session

/sf-start

This reads the control plane, identifies the current mode and active unit, and orients the agent. Run it at the beginning of every Claude Code session.

4. End a session

/sf-end

Updates the registry, appends to the unit log, commits, and closes out correctly.


Key Files in a SpecFlow Project

Control plane.specflow/specflow.md

---
version: 2
project: my-project
mode: design
active_design_unit: auth-service
---

## Unit Registry

units:
  - name: docs
    status: spec-complete
  - name: auth-service
    status: pending
    depends_on: [docs]

Unit spec.specflow/units/<name>/spec.md

Must contain three mandatory sections before the unit can enter Build Mode:

## Acceptance Criteria
- Given valid credentials, login() returns AuthTokens with access and refresh tokens
- Given wrong password, login() returns 401 (not a 500 or panic)

## Interface Contracts (Public Interface)
- login(email: string, password: string): AuthTokens | AuthError
- validateToken(token: string): TokenPayload | AuthError

## Explicit Out-of-Scope
- User registration (belongs to user-service)
- OAuth/social login (future work)

Unit log.specflow/units/<name>/log.md

Append-only session log. Four entry types:

## 2026-04-12 — [MILESTONE] Implement phase complete
All acceptance criteria pass.

## 2026-04-11 — [DEAD-END] Async bcrypt abandoned
Jest timer interference. Switched to synchronous bcrypt.

## 2026-04-10 — [SPEC GAP] Refresh token TTL undefined
Spec doesn't specify TTL for refresh tokens. Cannot implement without this.

## 2026-04-10 — [DESIGN NOTE] Spec template missing test framework field
Had to infer test runner from context. The unit spec template should require
a Test Infrastructure section so build agents don't have to guess.

CLI

specflow init [PROJECT_NAME]          # Bootstrap a new project
specflow compile [--output PATH]      # Regenerate CLAUDE.md for current mode
specflow status                       # Show current mode, active unit, NEXT task
specflow --version                    # Show version and supported control plane

All commands search upward from the current directory for .specflow/specflow.md. init always operates on the current directory.

The original bash scripts are preserved in reference/scripts/ for reference.


Rules

Rules live in rules/ as individual Markdown files with YAML frontmatter. specflow-compile.sh generates a mode-specific CLAUDE.md that points to the rules directories — it does not inline them. Agents read rules on demand.

rules/
├── core/          # Always-active rules
├── phase/         # Phase-specific rules (plan, specify, scaffold, validate, ...)
└── optional/      # Opt-in rules (patch-protocol, bug-protocol)

How It Works with Claude Code

Claude Code reads CLAUDE.md automatically at the start of every session. The mode-specific compiled output tells the agent:

  • Design Mode (~60 lines): spec quality requirements, phase gate protocol, control plane authority, pointers to rule directories
  • Build Mode (~30 lines): active unit and spec path, five hard rules (spec frozen, tests frozen, interfaces frozen), Spec Gap procedure

No plugins or integrations required. It's just files.


Platform-Agnostic

SpecFlow works with any AI coding agent that reads files:

  • Claude Code — reads CLAUDE.md automatically; use /sf-start and /sf-end
  • GitHub Copilot / Cursor — point it at .specflow/ files as context
  • Aider — pass spec files as context with --read
  • Any LLM — paste the relevant files into the conversation

The discipline is in the files, not the tool.


Documentation

  • docs/VISION.md — strategic vision and research background
  • docs/DECISIONS.md — architectural decisions with reasoning (10 confirmed)
  • docs/file-format-spec.md — complete file format reference (v2)
  • docs/claude-code-tips.md — Claude Code configuration tips
  • docs/cli-wrapper-plan.md — CLI design decisions (D1–D8)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

specflow_agent-0.1.0.tar.gz (36.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

specflow_agent-0.1.0-py3-none-any.whl (41.8 kB view details)

Uploaded Python 3

File details

Details for the file specflow_agent-0.1.0.tar.gz.

File metadata

  • Download URL: specflow_agent-0.1.0.tar.gz
  • Upload date:
  • Size: 36.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for specflow_agent-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fd697b7196de1bf3dc0e11483b6ef80e942388427fc479061715b0540878494e
MD5 7158316633dac471bfeaabbad7990dbf
BLAKE2b-256 165417c52668fe6e235d0d782809b74a98baa1c64a776c3a4167f79df443b621

See more details on using hashes here.

File details

Details for the file specflow_agent-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: specflow_agent-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 41.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for specflow_agent-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d7984cd5e9ffc347ce8bce76107b3c1b10197507485645012b5df3e196954b5e
MD5 b2e8a0d5b74bfba1e20b87cd4bcb95c5
BLAKE2b-256 b50f7c6dc532e213db65386d0a7a9dfb762a497d5c89abdea467de741a1e1b93

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page