Skip to main content

Evidence-first rails for agentic software delivery: Markdown roadmaps, paired evidence, and a machine-verified commit gate.

Project description

Delivery Workbench

Pixel-art delivery workbench: a desk with a retro computer showing a green checkmark, stamped contract papers, a rubber stamp, and a cargo cart carrying a sealed package

Delivery Workbench is a planning and commit gate system for Git repositories where AI agents do much of the work. It addresses two problems: agents claim work is done when it is not, and months later nobody can tell what a commit shipped or what tested it.

Plans are Markdown files in the repo, organized as phases and stories. A story cannot be marked done until a command run is recorded in its evidence file. A commit cannot land until a pre-commit hook checks a contract whose facts (branch, HEAD, staged tree) are stamped and re-verified. Each commit carries trailers naming the story it shipped and the contract that certified it. State is Markdown files and git data; there is no database or server.

Humans and agents use the same commands. Agents can also use the included MCP server.

Install

pipx install delivery-workbench
# or
brew install karolswdev/tap/delivery-workbench

Then set up any Git repository:

dw install /path/to/repo --skip-bootstrap

This copies the hooks, the CLI, and the MCP server into the repo's .githooks/ directory and points core.hooksPath at it. Commits are gated by the copy inside the repo, not by the global install. dw update /path/to/repo refreshes the copy; dw update /path/to/repo --check reports if it is stale.

For a project with existing history, there is an adoption flow that inspects the repo and drafts a roadmap for you. See the framework README.

The daily loop

.githooks/dw next                       # what should I work on?
.githooks/dw story status myapp 2 3 in-progress
# ... do the work ...
.githooks/dw evidence capture myapp 2 3 -- npm test
.githooks/dw story status myapp 2 3 done    # refuses if no evidence exists
git add -A
.githooks/dw contract new               # stamps verified facts into .tmp/CONTRACT.md
# read the contract, verify each rule actually holds, check its boxes
git commit                              # the hook re-verifies everything

Checking the contract's boxes is deliberately manual: it is the attestation that each rule was verified. No command or tool does it.

sequenceDiagram
  participant Dev as Human or agent
  participant DW as dw CLI
  participant Git as git commit
  participant Gate as pre-commit gate

  Dev->>DW: dw story status ... in-progress
  Dev->>Dev: do the work
  Dev->>DW: dw evidence capture ... -- <verify command>
  Dev->>DW: dw story status ... done (refuses without evidence)
  Dev->>DW: dw contract new (stamps verified facts)
  Dev->>Dev: verify each rule, check its boxes
  Dev->>Git: git commit
  Git->>Gate: re-derive every stamped fact
  Gate-->>Git: pass, or block naming the failed rule
  Git->>Git: stamp PMO trailers, archive the contract

Tracing a commit

The artifact chain:

flowchart LR
  C[commit + PMO trailers] --> S[story file]
  S --> E[evidence file with captured runs]
  C --> A[archived contract in .git]
  P[current-phase-status] --> S
  E -.proves.-> S

This repository uses its own gate, so the chain can be inspected here. One commit:

$ git log -1 --format='%h %s%n%(trailers:key=PMO-Story)%(trailers:key=PMO-Contract-Digest)' ec1fb4a
ec1fb4a Complete WLA-10-03: guarded mutation tools on the MCP surface
PMO-Story: WLA-10-03
PMO-Contract-Digest: sha256:2700dd6a9c8e8ee8ec6053e7a741ace4123ba6750b8946bf2331af9ecadc3777

The trailer names the story. The story file states the acceptance criteria. Its paired evidence file contains the recorded run that justified marking it done, including the exact command, exit code, and staged-tree hash at capture time:

### Captured run — 2026-07-03T19:59:44Z

- **Command:** `bash -c ... bash pmo-roadmap/tests/mcp-server.sh; python3 pmo-roadmap/tests/dw-core-tests.py ...`
- **Exit code:** 0
- **Index-tree:** b1c5aaa6e7845d8143d9f3cf24c039d491e7e1fd

The certified contract is archived under .git/pmo-contract-archive/<sha>. Because hooks only run where they are installed, dw verify re-checks the structural rules from pushed history, and CI catches commits that bypassed a local gate:

$ .githooks/dw verify --all
dw verify: ok (45 commits verified, 17 pre-epoch skipped)

The CLI

Command What it does
dw next The next actionable story. Exit 0 found, 2 nothing to do.
dw context --compact JSON snapshot of the roadmap: issues, warnings, next story, trace paths.
dw check Lints roadmap structure and evidence content. Greppable errors, exit 1 on issues.
dw story status <p> <ph> <st> <status> Updates a story's status transactionally. Refuses done without evidence.
dw evidence capture <p> <ph> <st> -- <cmd> Runs the command and records it into the story's evidence file.
dw contract new Writes .tmp/CONTRACT.md with stamped, machine-verified facts.
dw gate Dry-runs the commit gate against the current stage.
dw verify [--all] Re-checks the gate's structural rules over pushed history.
dw phase create, dw story create Scaffolding for new roadmap work.
dw doctor Checks the wiring in this clone.

All commands have stable exit codes. The orientation commands support --json or --porcelain output.

The MCP server

dw install also vendors .githooks/dw-mcp and writes an entry into the repo's .mcp.json, which Claude Code and other MCP clients pick up automatically. The server exposes nine tools backed by the same code as the CLI: dw_context, dw_next, dw_check, dw_doctor, dw_verify, dw_gate, dw_story_status, dw_evidence_capture, and dw_contract_new.

An agent can take a story from backlog to done through tool calls alone, with the same refusals the CLI gives. Two operations are deliberately absent: certifying a contract and creating a commit. Schemas and design are in docs/mcp.md.

The web view

dw-workbench --root /path/to/repo serves a localhost-only page for browsing the roadmap: phase tables, story and evidence pairs, a health console, and the trace from a story to the commits that shipped it. It can edit roadmap files through a guarded preview-then-apply flow. It never stages or commits.

Workbench project overview: phase table with status badges, evidence counts, the next actionable story, and a validation warning

More screenshots and two terminal recordings are in demos/.

Other components

  • Local work logs: consent-gated daily notes of what each commit delivered.
  • A Claude Code plugin with slash commands and a skill covering the operating loop.
  • A managed CLAUDE.md block installed into adopted repos.
  • A copyable verify-history CI job that re-checks pushed history on every pull request.

This repo runs on it

Every phase and story of the framework was shipped through its own gate: ten phases, each story with evidence, every commit with trailers and an archived contract, the full history passing dw verify --all. The trail is in pmo-roadmap/pm/roadmap/work-log-automation/.

Documentation

Tests

The suites live in pmo-roadmap/tests/ and run standalone. CI runs all of them on ubuntu and macos, the unit suite on python 3.9 (the floor), and history verification on every push.

License

MIT. Current version: 1.7.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

delivery_workbench-1.9.0.tar.gz (138.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

delivery_workbench-1.9.0-py3-none-any.whl (245.5 kB view details)

Uploaded Python 3

File details

Details for the file delivery_workbench-1.9.0.tar.gz.

File metadata

  • Download URL: delivery_workbench-1.9.0.tar.gz
  • Upload date:
  • Size: 138.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for delivery_workbench-1.9.0.tar.gz
Algorithm Hash digest
SHA256 fb33fd22c5164fc0b34eed990888af24517addd824909974ed38dbc73192325d
MD5 28f29ffe7a3ff875bbe89da3c8fe6b9f
BLAKE2b-256 6307fa4092a10d54eb2351ca2757dd1fb28e7f83cffb58cc4c694f27461fb6c1

See more details on using hashes here.

Provenance

The following attestation bundles were made for delivery_workbench-1.9.0.tar.gz:

Publisher: release.yml on karolswdev/delivery-workbench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file delivery_workbench-1.9.0-py3-none-any.whl.

File metadata

File hashes

Hashes for delivery_workbench-1.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9107260e16f7f36655e6c0ba9f97c6cd6e714e277c9ad9362bc1775002031460
MD5 580b18daa081ee90c2c48c5d2621f5ee
BLAKE2b-256 e3e8c50f146ff175edfe75a3cfe5b5bda94735cfe74f57323e3ff67f1c60cb85

See more details on using hashes here.

Provenance

The following attestation bundles were made for delivery_workbench-1.9.0-py3-none-any.whl:

Publisher: release.yml on karolswdev/delivery-workbench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page