Skip to main content

Deterministic validation for AI agent task completion.

Project description

DoneSpec

Done means deterministically verified.

DoneSpec is a tiny CLI for validating whether an AI coding agent actually completed a task.

It reads a machine-readable done.json, executes deterministic checks, and exits with 0 only when the task is verifiably complete.

donespec validate done.json
DoneSpec validation: fix-auth-bug

✓ npm tests passed  (1242.7ms)
✓ auth.ts exists  (0.2ms)
✓ returnTo is implemented  (0.5ms)
✗ shared types untouched  (12.1ms)
  Forbidden path modified: src/types.ts

Validation failed.
1 check failed.
Exit code: 1

1. Problem

AI coding agents are increasingly good at producing code, but they still frequently claim work is complete when it is not.

Common failures:

  • tests were not run,
  • tests fail,
  • files were modified incorrectly,
  • requirements were partially implemented,
  • forbidden paths were touched,
  • expected outputs were hallucinated,
  • runtime invariants were broken.

Humans are left reading optimistic summaries instead of deterministic evidence.

DoneSpec gives every task a local, reproducible completion contract.

2. Why AI agents fail

AI agents optimize for plausible task completion. Software delivery requires verified task completion.

An agent can say:

I fixed the auth bug and all tests pass.

DoneSpec asks:

  • Did the configured command exit successfully?
  • Does the expected file exist?
  • Does the required implementation marker exist?
  • Was a forbidden file modified?
  • Does the HTTP endpoint return the expected status?

No vibes. No screenshots. No hidden judgement. Just deterministic checks.

3. What DoneSpec solves

DoneSpec introduces a small validation layer between AI coding agents and human trust.

It provides:

  • a machine-readable done.json,
  • deterministic local checks,
  • CI/CD integration,
  • structured JSON output,
  • a checker registry for future extension,
  • zero LLM dependency in the validation path.

DoneSpec is not an AI wrapper, chatbot, dashboard, or orchestration system.

It is developer infrastructure.

4. Installation

pipx

pipx install donespec

uv

uv tool install donespec

from source

git clone https://github.com/donespec/donespec.git
cd donespec
python -m pip install -e ".[dev]"

Verify:

donespec --version

5. Quick example

Create done.json:

{
  "version": "1.0",
  "task_id": "fix-auth-bug",
  "must_pass": [
    {
      "type": "command",
      "name": "npm tests passed",
      "run": "npm test"
    },
    {
      "type": "file_exists",
      "name": "auth.ts exists",
      "path": "src/auth.ts"
    },
    {
      "type": "regex_in_file",
      "name": "returnTo is implemented",
      "path": "src/auth.ts",
      "pattern": "returnTo"
    }
  ],
  "must_not": [
    {
      "type": "file_not_modified",
      "name": "shared types untouched",
      "path": "src/types.ts"
    }
  ]
}

Run:

donespec validate done.json

Machine-readable output:

donespec validate done.json --json

6. CLI usage

donespec validate done.json

Options:

--json       Emit machine-readable JSON output.
--root PATH  Project root. Defaults to the spec file directory.
--fail-fast  Stop after first failed check.

Exit codes:

Code Meaning
0 Validation passed
1 Validation failed
2 Invalid spec or runtime error

Supported checks

command

Runs a shell command and validates the exit code.

{
  "type": "command",
  "name": "tests pass",
  "run": "pytest",
  "expected_exit_code": 0,
  "timeout_seconds": 120
}

file_exists

Checks that a file or directory exists.

{
  "type": "file_exists",
  "path": "src/auth.ts"
}

regex_in_file

Checks that a regex exists in a file.

{
  "type": "regex_in_file",
  "path": "src/auth.ts",
  "pattern": "returnTo"
}

Optional flags:

{
  "flags": ["IGNORECASE", "MULTILINE", "DOTALL"]
}

regex_absent

Checks that a regex does not exist in a file.

{
  "type": "regex_absent",
  "path": "src/auth.ts",
  "pattern": "console\\.log"
}

file_not_modified

Checks Git status to ensure a file or path was not modified, staged, deleted, or newly added.

{
  "type": "file_not_modified",
  "path": "src/types.ts"
}

http_check

Makes an HTTP request and validates the response status.

{
  "type": "http_check",
  "url": "http://127.0.0.1:8000/health",
  "method": "GET",
  "expected_status": 200,
  "timeout_seconds": 3
}

7. CI integration

GitHub Actions

name: DoneSpec

on:
  pull_request:

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: donespec/action@v1
        with:
          spec: done.json
          root: .

The action installs DoneSpec, runs validation, and fails CI if any check fails.

Direct CI command

- run: python -m pip install donespec
- run: donespec validate done.json

8. Philosophy

DoneSpec is intentionally boring.

It should feel closer to ESLint, Prettier, pytest, or package.json scripts than to an AI platform.

Principles:

  • local first,
  • deterministic only,
  • minimal dependencies,
  • no accounts,
  • no dashboard in the MVP,
  • no LLM calls in validation,
  • composable with any agent,
  • simple enough for solo developers,
  • strict enough for CI.

AI agents may generate work. DoneSpec verifies completion.

9. Roadmap

DoneSpec is designed for future extension, but the MVP stays small.

Planned directions:

  • more deterministic checkers,
  • generated done.json templates,
  • MCP server integration,
  • AI agent integrations,
  • VSCode extension,
  • cloud dashboard,
  • analytics,
  • multi-agent validation.

Not in the MVP:

  • authentication,
  • database,
  • SaaS dashboard,
  • orchestration system,
  • LLM dependency.

Repository layout

.
├── action.yml
├── done.schema.json
├── docs/
├── examples/
├── pyproject.toml
├── src/
│   └── donespec/
│       ├── cli.py
│       ├── engine.py
│       ├── loader.py
│       ├── models.py
│       ├── output.py
│       ├── schema.py
│       └── checkers/
└── tests/

Development

python -m pip install -e ".[dev]"
ruff check .
ruff format .
pytest

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

donespec-0.1.0.tar.gz (17.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

donespec-0.1.0-py3-none-any.whl (17.1 kB view details)

Uploaded Python 3

File details

Details for the file donespec-0.1.0.tar.gz.

File metadata

  • Download URL: donespec-0.1.0.tar.gz
  • Upload date:
  • Size: 17.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for donespec-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c8355c5aeedfda867cd837c07f57e33382f10db89a357280f58c136d5046fd1f
MD5 cdc03069f11327a4d267ce976f71bb9a
BLAKE2b-256 27d5923e2d1e0f2cadc3099c6983fe1e6d339a05d9b5eb41fc61b692196e9294

See more details on using hashes here.

File details

Details for the file donespec-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: donespec-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for donespec-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 db2b2d84f58da10e82e8e4142ee47b0de29cbaeda47e603b0a3cd10f1e765de3
MD5 cc5d9a91a649da191b5ed0c88ec91bda
BLAKE2b-256 8031741b91031a30873fe48c03fe72d120029d695e173b301070d0fd22e840f4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page