Skip to main content

AI-powered Specification-Driven Development framework: turns a vague idea into needs, spec, plan, and code via a 4-stage agent pipeline

Project description

specpilot — AI-Powered Specification-Driven Development

An AI agent pipeline that turns a vague idea into a needs document, a formal specification, an implementation plan, and working code — automatically, using Claude.

Built from scratch as a fully readable, minimal Python implementation so every concept is explicit and understandable. Useful as a standalone tool and as a reference for understanding how agentic frameworks (BMAD, SPECKIT) work under the hood.


Table of Contents

  1. What Is SDD?
  2. Architecture Overview
  3. Core Concepts Explained
  4. Step-by-Step: What Happens During a Run
  5. File Reference
  6. Setup and Installation
  7. Running the Tests
  8. What Is Missing: Gap Analysis vs BMAD and SPECKIT

1. What Is SDD?

Specification-Driven Development is a discipline where a formal specification document is produced before any implementation begins, and the entire project (planning, coding, testing) is traceable back to that spec.

An SDD AI framework automates that process using AI agents:

User's vague idea
       |
       v
  [DISCOVERY]  <-- AI asks clarifying questions until the need is precise
       |
       v
 [SPECIFICATION] <-- AI formalises the need into a structured spec document
       |
       v
   [PLANNING]  <-- AI breaks the spec into an ordered implementation plan
       |
       v
[IMPLEMENTATION] <-- AI guides the developer task-by-task through the build
       |
       v
  Documented, working software

Each stage produces a markdown artifact saved to disk (needs.md, spec.md, plan.md, impl_notes.md). These artifacts are the single source of truth — later stages always read from them rather than relying on conversation memory.


2. Architecture Overview

┌──────────────────────────────────────────────────────────────────────────┐
│         main.py  /  tests/simple_test.py  /  tests/test_run.py           │
│                          (CLI entry points)                              │
└──────────────────────────────┬───────────────────────────────────────────┘
                               │ creates & wires
                               v
┌──────────────────────────────────────────────────────────────────────────┐
│                           Orchestrator                                   │
│                                                                          │
│   Stage:  DISCOVERY → SPECIFICATION → PLANNING → IMPLEMENTATION → DONE  │
│                                                                          │
│   Routing rule: send user input to agents[current_stage]                │
│   Transition:   when agent sets context.stage_advance_requested = True  │
└────────┬──────────────────────────────────────────────────────┬──────────┘
         │ reads/writes                                          │ reads/writes
         v                                                       v
┌─────────────────────┐                             ┌───────────────────────┐
│   ProjectContext    │                             │      Agent (x4)       │
│   (shared state)    │◄────────────────────────────│                       │
│                     │                             │  name, role           │
│  raw_need           │  injected into every        │  system_prompt        │
│  clarified_need     │  LLM call as context        │  skills  (list)       │
│  spec_document      │  summary                    │  conversation_history │
│  plan_document      │                             │                       │
│  impl_notes         │                             │  run(user_msg) ──┐    │
│  stage              │                             │                  │    │
│  stage_advance_*    │                             │  _tool_use_loop()│    │
│  workspace_dir      │                             └──────────────────┼────┘
└─────────────────────┘                                                │
                                                                       │ calls
                                                     ┌─────────────────v────┐
                                                     │   Anthropic API      │
                                                     │   (Claude)           │
                                                     │                      │
                                                     │  stop_reason:        │
                                                     │  "end_turn" → text   │
                                                     │  "tool_use" → loop   │
                                                     └──────────────────────┘
                                                          │          │
                                               tool_use   │          │ result
                                               block      v          │
                                              ┌───────────────────┐  │
                                              │   Skill.execute() │──┘
                                              │                   │
                                              │  write_document()   │ → workspace/*.md
                                              │  write_code_file() │ → workspace/*.py etc.
                                              │  advance_stage()   │ → context flag
                                              └───────────────────┘

Key design principle

Agents do not talk to each other. They communicate through the shared ProjectContext.

The Elicitation agent writes needs.md and sets context.clarified_need. The Specification agent reads that field (injected into its system prompt) — it never "calls" the Elicitation agent. This loose coupling means any agent can be replaced independently.


3. Core Concepts Explained

3.1 ProjectContext — the shared blackboard

File: framework/core/context.py

ProjectContext
├── raw_need           str   — user's first, unpolished sentence
├── clarified_need     str   — refined summary after elicitation
├── spec_document      str   — formal spec written by SpecificationAgent
├── plan_document      str   — task breakdown written by PlanningAgent
├── impl_notes         str   — progress log written by ImplementationAgent
├── stage              Stage — current stage (enum)
├── stage_advance_requested  bool — flipped by advance_stage skill
├── stage_advance_summary    str  — what the agent accomplished
└── workspace_dir      str   — where *.md files are saved ("workspace/")

Every agent receives a text summary of the context injected into its system prompt on every LLM call:

# agent.py
def _system_prompt_with_context(self) -> str:
    return (
        self._base_system_prompt
        + "\n\n## Current Project Context\n"
        + self.context.summary_for_agents()
    )

This means even if the same agent is called many turns later, it always has an up-to-date view of what previous stages produced — without any explicit handoff message.

Why a shared blackboard instead of message passing?

Message passing (agent A sends a message to agent B) creates tight coupling and requires a shared message bus. A blackboard is simpler: every agent reads and writes the same object. This is the pattern used by BMAD's document dependency chain and SPECKIT's SPEC.md / PLAN.md / TASKS.md artifacts.


3.2 Skill — a Python function exposed as an LLM tool

File: framework/core/skill.py

A Skill wraps a plain Python function and exposes it to Claude as a tool definition (JSON Schema).

@dataclass
class Skill:
    name: str          # "write_document"
    description: str   # shown to the LLM to help it decide when to call it
    parameters: dict   # JSON Schema of the function's arguments
    execute: Callable  # the actual Python function

Converting a skill to the Anthropic API format is one method:

def to_tool_schema(self) -> dict:
    return {
        "name": self.name,
        "description": self.description,
        "input_schema": self.parameters,   # Anthropic's required field name
    }

The three skills in this framework:

Skill Who has it What it does Side effect
write_document All agents Saves a markdown file to workspace/ Updates the matching context field (e.g. context.spec_document)
write_code_file ImplementationAgent Writes any source file (.py, .toml, …) to workspace/ so code can be run Appends path to context.code_files
advance_stage All agents Signals that the stage is complete Sets context.stage_advance_requested = True

Agents call skills by requesting them in the LLM response — they never call skill.execute() directly. The agent's tool-use loop dispatches the call.


3.3 Agent — role + system prompt + tool-use loop

File: framework/core/agent.py

Each agent is an instance of the Agent class with:

  • A name and role (e.g. "Elicitor / Product Analyst")
  • A system prompt defining its expertise and instructions
  • A list of skills it can invoke
  • Its own conversation history (messages within this stage only)

The tool-use loop

This is the heart of the framework. When an agent calls the LLM, Claude may respond with text (done) or with a tool_use block (it wants to run a skill).

Agent.run(user_message)
  │
  ├─ append message to conversation_history
  │
  └─ _tool_use_loop()
       │
       ├─ call Anthropic API with:
       │    - system prompt (base + context summary)
       │    - full conversation history
       │    - tools = [skill.to_tool_schema() for skill in self.skills]
       │
       ├─ if stop_reason == "tool_use":
       │    for each tool_use block in response:
       │      result = _dispatch_skill(block.name, block.input)
       │    append assistant turn (with tool_use blocks) to messages
       │    append user turn (with tool_result blocks) to messages
       │    └─ LOOP AGAIN (Claude gets the tool results and continues)
       │
       └─ if stop_reason == "end_turn":
            extract text from content blocks
            return text

In a single turn, Claude may call multiple skills before returning text. For example the Specification agent calls write_document then advance_stage in the same response — the loop handles both before returning.

After _tool_use_loop returns, run() checks:

if self.context.stage_advance_requested:
    self._stage_complete = True
    self.context.stage_advance_requested = False   # consumed

This is how stage transitions work: the skill writes to the context, the agent reads from it, the orchestrator reads from the agent.


3.4 Orchestrator — the stage-machine router

File: framework/core/orchestrator.py

The orchestrator holds a dict[Stage, Agent] and a reference to the shared ProjectContext. Its job is purely routing:

def process(self, user_input: str) -> tuple[str, bool]:
    agent = self.agents[self.context.stage]   # pick agent for current stage
    response = agent.run(user_input)           # delegate

    stage_changed = False
    if agent.stage_complete:
        agent.reset_stage_complete()
        stage_changed = self._advance_stage()  # move context.stage forward

    return response, stage_changed

_advance_stage walks a fixed ordered list:

DISCOVERY → SPECIFICATION → PLANNING → IMPLEMENTATION → DONE

When DONE is reached, orchestrator.is_done() returns True and the REPL exits.

Why a linear state machine?

It maps directly onto SDD's sequential workflow. Each stage has a single clear purpose and must complete before the next begins. This is intentional for a learning framework — real frameworks like LangGraph use directed graphs that allow loops and parallel branches (see Section 8).


4. Step-by-Step: What Happens During a Run

This traces every event for a single "I want to build a todo list app" input.

Step 1 — Bootstrapping (main.py or test_run.py)

client = anthropic.Anthropic(api_key=...)
context = ProjectContext(workspace_dir="workspace")

agents = {
    Stage.DISCOVERY:       make_elicitation_agent(context, client),
    Stage.SPECIFICATION:   make_specification_agent(context, client),
    Stage.PLANNING:        make_planning_agent(context, client),
    Stage.IMPLEMENTATION:  make_implementation_agent(context, client),
}

orchestrator = Orchestrator(context=context, agents=agents)

All four agents share the same context object and the same client. Nothing is called yet.

Step 2 — First user message enters

User: "I want to build a todo list app"

orchestrator.process("I want to build a todo list app") is called. context.stage == Stage.DISCOVERY, so the ElicitationAgent receives the message.

Step 3 — ElicitationAgent calls the LLM

agent.run()_tool_use_loop() → Anthropic API call with:

  • System prompt: "You are a senior product analyst..." + context summary
  • Messages: [{"role": "user", "content": "I want to build..."}]
  • Tools: [write_document schema, advance_stage schema]

Claude responds with stop_reason == "end_turn" and a text question:

"Great idea! Who are the main users and what are the 3 core features?"

No skill was called. The loop exits immediately. The text is returned to the REPL.

Step 4 — Several more turns

The user answers the clarifying questions. Each turn:

  1. User input → orchestrator.process()agent.run()
  2. LLM responds with another question (no tool call yet)
  3. Text printed to user

Step 5 — Elicitation agent decides it has enough information

After the user confirms the scope, Claude's response includes tool_use blocks:

[
  {
    "type": "tool_use",
    "id": "toolu_01",
    "name": "write_document",
    "input": {
      "filename": "needs.md",
      "content": "# Project Needs\n...",
      "doc_type": "needs"
    }
  },
  {
    "type": "tool_use",
    "id": "toolu_02",
    "name": "advance_stage",
    "input": { "summary": "Clarified a personal todo app with 3 features" }
  }
]

stop_reason == "tool_use". The loop:

  1. Calls write_document(filename="needs.md", content="...", doc_type="needs") → saves file to workspace/needs.md → sets context.clarified_need = content → returns "Saved to workspace/needs.md"

  2. Calls advance_stage(summary="...") → sets context.stage_advance_requested = True → returns "Stage advance requested"

  3. Appends both tool results to messages and calls the LLM again.

  4. LLM returns a text confirmation: "needs.md saved, moving to Specification..." stop_reason == "end_turn" → loop exits, text returned.

Step 6 — Orchestrator advances stage

Back in orchestrator.process():

if agent.stage_complete:          # True, because advance_stage was called
    agent.reset_stage_complete()
    stage_changed = self._advance_stage()   # context.stage = SPECIFICATION

stage_changed = True is returned to the REPL which prints the new banner.

Step 7 — SpecificationAgent takes over

Next user message → orchestrator.process() → now routes to SpecificationAgent.

The agent's system prompt says "read the clarified need from Project Context". The context summary injected into the system prompt includes:

Clarified need: Personal todo app with add/complete/delete tasks, local JSON storage, Python CLI

The SpecificationAgent writes spec.md with FR-01…FR-N sections, calls advance_stage → orchestrator moves to PLANNING.

Step 8 — Planning and Implementation follow the same pattern

Each agent reads previous artifacts from the context summary, produces its own document, and calls advance_stage to hand off.

Step 9 — DONE

When ImplementationAgent calls advance_stage, the orchestrator sets context.stage = Stage.DONE. orchestrator.is_done() returns True. The REPL prints the completion summary and exits.

workspace/ now contains:

needs.md       — clarified requirements
spec.md        — formal functional + non-functional requirements
plan.md        — phased implementation plan with tasks
impl_notes.md  — what was built, design decisions, remaining work

5. File Reference

mysdd/
│
├── main.py                    Entry point for interactive CLI
├── config.py                  API key + model + workspace dir settings
├── requirements.txt           anthropic>=0.40.0
├── tests/
│   ├── simple_test.py         Fast smoke test (~2 min, word-count tool)
│   ├── test_run.py            Automated end-to-end test (~5 min)
│   └── persist_test.py        Session save/load round-trip test
├── docs/
│   ├── ROADMAP.md             14 missing features with design sketches
│   ├── INSTALL.md             Installation guide
│   ├── USAGE.md               Usage guide
│   ├── DISTRIBUTION.md        Distribution procedure
│   ├── QUICK_REFERENCE.md     30-minute publishing checklist
│   └── GETTING_OTHERS_TO_USE.md
│
├── framework/
│   ├── core/
│   │   ├── context.py         ProjectContext dataclass + Stage enum
│   │   ├── skill.py           Skill dataclass (wraps Python fn as LLM tool)
│   │   ├── agent.py           Agent base class with tool-use loop
│   │   └── orchestrator.py    Stage-machine router
│   │
│   ├── skills/
│   │   ├── document_writer.py  write_document skill factory
│   │   └── advance_stage.py    advance_stage skill factory
│   │
│   └── agents/
│       ├── elicitation.py      Stage 1: ElicitationAgent
│       ├── specification.py    Stage 2: SpecificationAgent
│       ├── planning.py         Stage 3: PlanningAgent
│       └── implementation.py   Stage 4: ImplementationAgent
│
└── workspace/                 Generated documents land here
    ├── needs.md
    ├── spec.md
    ├── plan.md
    └── impl_notes.md

6. Setup and Installation

Prerequisites

  • Python 3.10 or later
  • An Anthropic API key (sk-ant-...)

Install

pip install specpilot

Or install from source:

git clone https://github.com/malif78/specpilot.git
cd specpilot
pip install -e .

API key — one-time setup

Create a .env file in the project root (it is git-ignored and never committed):

ANTHROPIC_API_KEY=sk-ant-...

config.py loads this file automatically on every run — no need to set an environment variable in each terminal session. A real environment variable always takes precedence over .env if both are set.


7. Running the Tests

7.1 Quick smoke test

simple_test.py runs a focused demo on a word-count CLI tool — all four stages in roughly 2 minutes (~8 API calls). Good for a fast sanity check.

python tests/simple_test.py

Expected workspace output:

workspace_simple/
  needs.md          spec.md
  plan.md           impl_notes.md
  wc_tool.py        pyproject.toml   ← actual runnable code

Run the generated tool immediately after:

python workspace_simple\wc_tool.py README.md

7.2 Full automated end-to-end test

test_run.py drives all four stages using a longer scripted conversation about a "personal expense tracker CLI" (~5 minutes, ~20 API calls).

python tests/test_run.py

What it does:

Phase Scripted messages sent Expected agent behaviour
Discovery (3 turns) Describes app, answers clarifying questions Asks 1-2 questions per turn, writes needs.md, advances
Specification (1 turn) "Go ahead and write the spec" Writes spec.md with FR-01…, advances
Planning (1 turn) Confirms stdlib + argparse stack Writes plan.md with phases + tasks, advances
Implementation (3 turns) "Start Phase 1", "next phase", "done" Writes source files + impl_notes.md, advances

Expected output (final lines):

Final stage   : done
Has spec doc  : yes
Has plan doc  : yes
Has impl notes: yes

Workspace files:
  impl_notes.md  (≈3 KB)
  needs.md       (≈1 KB)
  plan.md        (≈6 KB)
  spec.md        (≈4 KB)

To re-run cleanly (fresh workspace):

Remove-Item workspace\*.md
python tests/test_run.py

7.3 Interactive CLI

main.py runs the real conversational REPL. Type your own application idea.

python main.py

Example session:

------------------------------------------------------------
  SDD Framework  —  Specification-Driven Development
------------------------------------------------------------

Type your idea below.  'quit' or Ctrl-C to exit.

------------------------------------------------------------
  DISCOVERY      — Understanding your need
------------------------------------------------------------

[Elicitor · Product Analyst] Welcome! Tell me about your idea...

You > I want to build a recipe manager web app
...

Type quit to exit at any point.

7.4 Reading the workspace output

After a run, inspect the generated documents:

# List all generated files with sizes
Get-ChildItem workspace\

# Read the spec
Get-Content workspace\spec.md

# Read the plan
Get-Content workspace\plan.md

What good output looks like:

  • needs.md — 300-600 words, mentions: problem, users, 3-5 MVP features, constraints
  • spec.md — structured with FR-01FR-N numbered requirements, NFRs, out-of-scope
  • plan.md — 3-6 phases, each phase has checkboxed tasks naming specific files/modules
  • impl_notes.md — records what was built, design decisions, remaining phases

8. What Is Missing: Gap Analysis vs BMAD and SPECKIT

This framework is a learning skeleton. Below is an honest comparison with two production-grade SDD frameworks and a full list of gaps.

8.1 BMAD-METHOD

BMAD (Breakthrough Method for Agile AI-Driven Development) is an open-source framework that orchestrates 12+ specialized AI agents through a full agile workflow, with IDE integration for Claude Code, Cursor, and VSCode.

BMAD feature Our framework Gap
12+ specialized agents (Analyst, PM, Architect, Scrum Master, QA, Dev, PO…) 4 hardcoded agents Only 4 agents with fixed roles; no configurable personas
Adaptive complexity — same workflow scales from a bug fix to an enterprise platform Single fixed 4-stage pipeline No way to skip stages, add stages, or loop back
Cross-agent delegation — agents can hand off sub-tasks to other agents None Agents are isolated; no inter-agent messaging
Quality gates and checklists between stages None No formal accept/reject between stages; an agent can advance prematurely
BMad Builder — users build and share custom agents Agents are Python classes only No plugin system; adding an agent requires code changes
Agile artifacts — user stories, sprint backlog, acceptance criteria Only 4 markdown docs No user story format, no backlog, no sprint concept
Session persistence — resume a project across sessions Context dies with the process Every run starts from scratch
Multiple LLM support Claude only No model routing or fallback

8.2 SPECKIT (GitHub Spec Kit)

SPECKIT treats specifications as executable, first-class artifacts. Its key innovations are context discovery hooks (probing the codebase before planning) and validation hooks (checking artifacts after each stage).

SPECKIT feature Our framework Gap
7-phase workflow (Constitution → Specification → Clarification → Planning → Task Breakdown → Implementation → Validation) 4 phases Missing: Constitution (project governance), Task Breakdown (granular task list), Validation (post-implementation checks)
Context discovery hooks — agents read existing code/APIs/conventions before planning None Agents have no awareness of an existing codebase; they hallucinate file names and APIs
Validation hooks — post-phase checks verify artifacts (do the files exist? do the tests pass?) None No verification that what was planned actually got built
SPEC.md → PLAN.md → TASKS.md pipeline — each artifact is a typed, structured document Freeform markdown Documents have no enforced schema; a misbehaving agent could produce garbage
Agent-agnostic — works with any AI assistant (Claude, Copilot, Gemini, Cursor…) Claude only Hard dependency on Anthropic SDK
Customization presets and extensions None No configuration file; all customization requires Python code changes
Task tracking — TASKS.md with explicit done/not-done state impl_notes.md is prose No machine-readable task state; cannot resume mid-plan

8.3 Full Gap List

The following features exist in production frameworks but are absent here. They are roughly ordered from highest to lowest impact.

Persistence and Memory

Gap Description How to add it
No session resumption Killing the process loses all context Serialize ProjectContext to JSON on every state change; load on startup if file exists
No cross-session memory Agents forget previous projects Add a vector store (ChromaDB, FAISS) indexed by project; inject relevant past decisions into system prompts
No long-term agent memory Each agent's conversation history resets per run Persist agent.conversation_history to disk alongside context

Orchestration

Gap Description How to add it
Linear only Stages go forward only; no loops, no branches Replace the list-based state machine with a directed graph (LangGraph pattern); add loop-back edges for "needs more clarification"
No parallel agents Agents run sequentially Use asyncio + asyncio.gather to run independent agents concurrently (e.g., Architect and QA reviewing the spec simultaneously)
No agent delegation An agent cannot spawn a sub-agent Add a delegate_to(agent_name, task) skill that calls another agent as a sub-task
No human-in-the-loop gates Stages advance automatically when an agent says so Add a formal approval step — pause, show the user the artifact, require explicit "approve" or "request changes"

Context and Grounding

Gap Description How to add it
No codebase discovery Agents don't know the existing project structure Add a discover_context skill that runs git ls-files, reads key files, and injects findings into the planning stage
No web/doc search Agents can't look up libraries, APIs, or standards Add a web_search skill backed by a search API
No RAG No retrieval of relevant past decisions or docs Add vector-search over the workspace documents so later agents can query earlier artifacts semantically

Output Quality

Gap Description How to add it
No artifact schema validation Agents can produce malformed documents Define JSON schemas for each document type; parse the LLM output and retry if validation fails
No retry / fallback logic Any API error or bad output crashes the run Wrap _tool_use_loop in exponential backoff; add an output validator that triggers a re-prompt on failure
No output evaluation No way to score whether the spec is complete Add an Evaluator agent that scores each artifact against a rubric and returns a pass/fail with feedback

Developer Experience

Gap Description How to add it
No streaming Responses appear all at once (blocking) Use client.messages.stream() and print tokens as they arrive
No async Everything is synchronous; UI freezes during LLM calls Rewrite _tool_use_loop with asyncio; use client.messages.create_async()
No observability No tracing, token counts, or cost tracking Log every LLM call with timestamp, tokens in/out, cost; integrate with LangSmith or a custom logger
No prompt versioning System prompts are hardcoded strings Move prompts to YAML/TOML files; version them in git; A/B test variants
Hardcoded agents Adding a new agent requires Python code Define agents in a config file (YAML); the framework loads them dynamically
No tool library Only 2 skills available Add: run_code, read_file, search_web, run_tests, create_github_issue, send_email, …

8.4 Summary Table

Feature                        Our Framework   SPECKIT   BMAD
--------------------------------------------------------------
Core SDD workflow                   Y            Y        Y
Multi-stage artifacts               Y            Y        Y
Tool use (skills)                   Y            Y        Y
Session persistence                 Y            Y        Y
Codebase discovery hooks            N            Y        N
Post-stage validation               N            Y        N
Non-linear orchestration            N            N        Y
12+ specialized agents              N            N        Y
Human-in-the-loop gates             N            Y        Y
Long-term memory / RAG              N            N        Y
Parallel agent execution            N            N        N
Streaming responses                 N            Y        Y
Artifact schema validation          N            Y        N
Retry / fallback logic              N            Y        N
Observability / tracing             N            N        Y
Configurable agents (no code)       N            Y        Y
Multi-LLM support                   N            Y        N

8.5 Session Persistence (Implemented)

Session persistence has been implemented. It is the foundation for all other advanced features — you cannot build evaluation pipelines or long-term memory without it.

How it works

workspace/
  .session.json        ← written atomically after every agent turn
  needs.md
  spec.md
  plan.md
  impl_notes.json

The session file stores two things:

  1. Context snapshot — all ProjectContext fields serialized as JSON. The stage enum is stored as its string value ("planning"). Transient flags (stage_advance_requested) are always reset to False.

  2. Agent conversation histories — each agent's per-stage message list, keyed by stage name. This is what allows an agent to resume mid-conversation without re-asking questions it already answered.

{
  "version": 1,
  "saved_at": "2026-05-27T13:47:55",
  "context": {
    "raw_need": "a note-taking CLI app",
    "clarified_need": "...",
    "spec_document": "...",
    "stage": "planning",
    "workspace_dir": "workspace"
  },
  "agent_histories": {
    "discovery":       [{"role": "user", "content": "..."}, ...],
    "specification":   [...],
    "planning":        [...],
    "implementation":  [...]
  }
}

Key design decisions

Decision Reason
In-place context restore (restore_from_dict) All agents hold a reference to the same context object. Replacing it with a new one would leave agents pointing at stale data.
Atomic write (temp file → os.replace) A crash mid-save never produces a corrupt session file — the old file remains intact until the new one is fully written.
Transient flags not persisted stage_advance_requested is an in-flight signal, not state. Persisting it could cause the stage to advance twice on resume.
Session deleted on DONE A completed project should start fresh next time. The workspace documents (spec.md, etc.) are the durable artifacts — the session file is scaffolding.
session_metadata() fast-read The resume prompt reads only the small metadata header, not the full document content, so the prompt appears instantly even for large sessions.

Resume flow in main.py

python main.py
  │
  ├─ build_orchestrator()          — fresh context + agents (all blank)
  │
  ├─ _maybe_resume()
  │     ├─ session_metadata()      — fast-read: stage, saved_at, raw_need preview
  │     ├─ print resume prompt
  │     └─ if Y: orchestrator.load_session()
  │               ├─ context.restore_from_dict()  — fills all context fields
  │               └─ agent.conversation_history = saved_history  (per stage)
  │
  └─ run_repl(resumed=True/False)
        ├─ if fresh: send opening message → ElicitationAgent greets user
        └─ if resumed: skip opening message → user types next message directly

Running the persistence test

python tests/persist_test.py

This test verifies the full round-trip without running the complete pipeline:

  • Sends 2 turns to the elicitation agent (makes 2 real API calls)
  • Asserts the session file was written correctly
  • Builds a brand-new orchestrator (simulating a restart)
  • Loads the session and asserts every field and history message matches
  • Verifies session_metadata() fast-read
  • Verifies delete_session() removes the file

Every other gap (memory, RAG, validation) builds on top of persistence.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

specpilot_ai-1.0.0.tar.gz (59.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

specpilot_ai-1.0.0-py3-none-any.whl (43.0 kB view details)

Uploaded Python 3

File details

Details for the file specpilot_ai-1.0.0.tar.gz.

File metadata

  • Download URL: specpilot_ai-1.0.0.tar.gz
  • Upload date:
  • Size: 59.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for specpilot_ai-1.0.0.tar.gz
Algorithm Hash digest
SHA256 d4d0fe3c3bd9704335fa651a408d1a6c641900c0157a1d51581e3d02eac981ee
MD5 fd303c4e5b67b1faccead3c517978b56
BLAKE2b-256 10922bf2e6ba1612af5898956253c5e610aa4541e503ee20d1dc05bbca3ad006

See more details on using hashes here.

File details

Details for the file specpilot_ai-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: specpilot_ai-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 43.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for specpilot_ai-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6c81307177900826a7c06e7fc2a11fea27e5b12586bfbdc8901571723e822928
MD5 b5ae7a1861ab6392629d9d1bbf22da7e
BLAKE2b-256 fc614cc551a14dccd9ed95fe1b5546316e76bf3bda17a2b5e812b81d56bc8e59

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page