Deterministic governance controls for AI agent-driven software delivery
Project description
Controlled Execution System (CES)
Deterministic governance for AI agent-driven software delivery
What is CES?
AI agents can write code. But should you let them ship it without guardrails?
CES is a CLI tool that gives engineering teams (2-50 people) structured oversight of AI agents building their software. Instead of hoping agents produce correct code, CES provides deterministic controls that verify it — with trust that scales based on measured evidence, not faith.
Think of it as a governance layer between "an AI wrote this code" and "this code is in production." Every change gets classified by risk, reviewed by independent agents, and tracked in a tamper-proof audit ledger. Low-risk changes flow through automatically. High-risk changes get human review. Trust expands as agents prove themselves — and contracts when they don't.
The core principle: No autonomy expansion may rely solely on advisory controls. Every escalation in agent freedom must be backed by hard-enforced verification.
Default posture: CES is builder-first. Start with ces build, let CES set up local project state if needed, and only drop into expert commands when you need direct governance control.
How It Works
CES operates across three planes, each with a distinct role:
┌─────────────────────────────────────────────────────────────────────┐
│ │
│ CONTROL PLANE (deterministic — no LLM calls) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ Manifest │ │ Policy │ │ Workflow │ │ Audit Ledger │ │
│ │ Manager │ │ Engine │ │ State │ │ (append-only, │ │
│ │ │ │ │ │ Machine │ │ hash-chained) │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └──────────────────┘ │
│ │ │ │ │
│────────┼─────────────┼─────────────┼────────────────────────────────│
│ │ │ │ │
│ HARNESS PLANE (orchestration + quality assurance) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ Trust │ │ Evidence │ │ Review │ │ Classification │ │
│ │ Manager │ │Synthesiz.│ │ Router │ │ Engine+Oracle │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └──────────────────┘ │
│ │ │ │ │
│────────┼─────────────┼─────────────┼────────────────────────────────│
│ │ │ │ │
│ EXECUTION PLANE (bounded agent work) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ Agent │ │ Guide │ │ Self- │ │ Sensor │ │
│ │ Runner │ │Pack Build│ │Correction│ │ Orchestrator │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
Task Lifecycle
Every change follows this path:
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Create │────▶│ Classify │────▶│ Execute │────▶│ Review │
│ Manifest │ │ (risk + │ │ (bounded │ │(independ.│
│ │ │ class) │ │ agent) │ │ agents) │
└──────────┘ └──────────┘ └──────────┘ └────┬─────┘
│
┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ Deployed │◀────│ Merged │◀────│ Approved │◀────────┘
│ │ │ │ │(human or │
│ │ │ │ │ auto) │
└──────────┘ └──────────┘ └──────────┘
- Tier C tasks (low risk): Can flow through fully autonomously
- Tier B tasks (medium risk): Require hybrid human+agent review
- Tier A tasks (high risk): Require full human approval, 3 different LLM models for review diversity
Key Concepts
Manifests
A manifest is the governance contract for a task. It defines what an agent is allowed to do: which files it can touch, which tools it can use, how many tokens it can spend, and when it expires. No manifest = no work.
Classification
Every manifest gets classified along three dimensions:
| Dimension | What It Measures | Values |
|---|---|---|
| Risk Tier | Blast radius if something goes wrong | A (highest) → B → C (lowest) |
| Behavior Confidence | How predictable the output is | BC1 (deterministic) → BC2 → BC3 (subjective) |
| Change Class | Type of modification | Class 1 (new) → Class 5 (removal) |
The aggregate classification determines the review workflow. The classifier must always be a different agent than the implementer.
Trust Tiers
Agent profiles earn trust over time:
candidate ──────▶ trusted ◀──────▶ watch ◀──────▶ constrained
(new agent) (proven) (slipping) (restricted)
Trust moves based on measured performance: defect rates, escape rates, calibration probe results. Hidden checks (agents don't know they're being tested) prevent gaming.
Evidence Packets
Every task produces an evidence packet — a structured bundle of test results, sensor outputs, review findings, and the decision view. This is the proof trail. Evidence packets are immutable once assembled.
Audit Ledger
Every significant action is recorded in an append-only, hash-chained ledger. No updates, no deletes (enforced by database triggers). 14 event types from approvals to kill-switch activations. Tamper detection via HMAC chain.
Quick Start
Prerequisites
- Python 3.12+
- uv (package manager)
- A supported local runtime:
codexorclaude
Just want to try it? See the 5-Minute Quickstart — no Docker, no Postgres, no API keys.
1. Clone and install
git clone https://github.com/chrisduvillard/controlled-execution-system.git
cd controlled-execution-system
uv sync
2. Verify a local runtime
ces doctor
3. Configure environment
cp .env.example .env
# Optional: set CES_AUDIT_HMAC_SECRET and CES_DEMO_MODE=1.
# Real local execution uses the installed `codex` or `claude` CLI.
4. Start with ces build
# Fresh repo or existing repo: `ces build` auto-creates `.ces/` on first run
# Builder-first path: describe what you want and let CES draft the contract
ces build "Add input validation to the user registration endpoint" --yes
# Resume the latest builder session without re-entering context
ces continue --yes
# Explain the latest builder state in plain language
ces explain
ces explain --view decisioning
ces explain --view brownfield
# Check the latest request, activity, and next step
ces status
# Export a concise builder run report for audit or handoff
ces report builder
If you prefer manual setup before the first build, CES still supports:
ces init my-project
For existing repos, CES auto-detects brownfield mode and asks what must be preserved before it runs the change. ces continue resumes the saved session stage instead of replaying the whole flow, and once a run is finished it points you back to ces build for the next request. You can force either path with --greenfield or --brownfield.
For day-to-day brownfield delivery, stay builder-first with ces build, ces continue, and ces explain --view brownfield. Use the expert brownfield commands only when you need an explicit legacy-behavior decision such as ces brownfield review OLB-<entry-id> --disposition preserve. The Brownfield Guide covers that handoff in more detail.
Use the builder-first flow when you want CES to carry the current request context for you. Switch to the expert workflow when you need explicit review, triage, approval, or audit/handoff artifacts. The Operator Playbook shows the boundary and recommended command sequences.
When you leave the single-request builder loop and need system-wide visibility or incident response, use the expert operations surfaces instead of relying on the builder-first ces status view:
ces status --expert
ces status --expert --watch
ces audit --limit 20
ces emergency declare "Security incident detected"
The Operations Runbook covers incident drills and recovery expectations for those commands.
If you want the explicit expert workflow instead, CES still supports:
ces manifest "Add input validation to the user registration endpoint" --yes
ces classify M-<manifest-id>
5. Use local expert commands directly
ces execute M-<manifest-id> --runtime auto
ces review M-<manifest-id>
ces triage M-<manifest-id>
ces approve M-<manifest-id> --yes
Try the FreshCart example
These demo commands are intended for a source checkout of CES after git clone and uv sync.
# Seed sample data
uv run python -m examples.freshcart.seed_data
# Run the end-to-end workflow
uv run python -m examples.freshcart.run_e2e
Use CES on CES
If you want CES to review changes to this repository itself, initialize repo-local state first:
ces init controlled-execution-system
ces dogfood --base origin/master
This creates a local .ces/ directory for repo-specific state. Keep that directory untracked; it is operational state, not project source.
CLI Commands
ces <command> [options]
Start Here
| Command | Description |
|---|---|
build |
Builder-first local workflow: describe the change, gather only the missing context, run, review, and approve |
continue |
Resume the latest saved builder session without re-entering the same setup context |
explain |
Summarize the latest builder brief, evidence, blockers, and next step in plain language |
status |
Show builder-first project status; add --expert for the full expert view |
report builder |
Export the latest builder run report for audit or reviewer handoff |
init |
Optional manual setup before the first builder-first run |
Advanced Governance
| Command | Description |
|---|---|
manifest |
Generate a task manifest from a natural-language description |
classify |
Classify a manifest (risk tier, behavior confidence, change class) |
execute |
Execute an agent task within manifest boundaries |
review |
Run the review pipeline on completed work |
triage |
Pre-screen evidence with triage color (green/amber/red) |
approve |
Approve or reject an evidence packet |
gate |
Evaluate a phase gate (computational + agent checks) |
intake |
Run intake interview for a project phase |
calibrate |
Run hidden calibration probes against an agent |
audit |
Expert operations audit inspection; for example, ces audit --limit 20 |
emergency declare |
Expert operations emergency declaration; for example, ces emergency declare "Security incident detected" |
| Command Groups | |
vault ... |
Knowledge vault operations (Zettelkasten-style notes) |
brownfield ... |
Expert legacy behavior capture, review, and promotion |
Configuration
Copy .env.example to .env and configure:
| Variable | Description | Default |
|---|---|---|
CES_AUDIT_HMAC_SECRET |
HMAC secret for audit chain integrity | (change in managed environments) |
CES_LOG_LEVEL |
Logging level | INFO |
CES_LOG_FORMAT |
Log format (json or text) |
json |
CES_DEFAULT_RUNTIME |
Default local runtime | codex |
CES_DEMO_MODE |
Use demo helper responses when no CLI-backed provider is available | 0 |
For local runtime execution, CES relies on an installed codex or claude
CLI. Any provider-specific credentials are handled by that runtime rather than
through CES package extras.
Project Structure
controlled-execution-system/
├── src/ces/
│ ├── cli/ # Typer CLI
│ │ ├── __init__.py # App entry point
│ │ ├── run_cmd.py # Builder-first local workflow
│ │ ├── status_cmd.py # Status and explanation surfaces
│ │ └── *_cmd.py # Expert command modules
│ ├── control/ # Governance engine
│ │ ├── db/ # SQLAlchemy tables and repositories
│ │ ├── models/ # Governance domain models
│ │ └── services/ # Manifest, policy, workflow, merge
│ ├── harness/ # Quality assurance
│ │ ├── models/ # Harness-facing models
│ │ ├── sensors/ # Computational sensors
│ │ └── services/ # Trust, evidence, review, guide packs
│ ├── execution/ # Agent orchestration
│ │ ├── agent_runner.py # Agent runner
│ │ ├── providers/ # LLM provider adapters
│ │ ├── runtimes/ # Runtime registry + adapters
│ │ └── sandbox.py # Execution sandboxing
│ ├── intake/ # Intake interview flow
│ ├── knowledge/ # Vault services and ranking
│ ├── emergency/ # Kill switch
│ ├── brownfield/ # Legacy integration
│ ├── observability/ # Internal metrics and telemetry helpers
│ └── shared/ # Enums, crypto, config, logging
├── tests/
│ ├── unit/ # 157 test files
│ └── integration/ # End-to-end and regression integration tests
├── examples/ # FreshCart demo project
├── docs/ # PRD, guides, reference cards
├── pyproject.toml # Project config + dependencies
└── .env.example # Environment template
Testing
CES maintains an 90%+ branch coverage gate enforced by CI.
# Run all unit tests
uv run pytest
# Run with coverage report
uv run pytest --cov=ces --cov-report=term-missing
# Skip integration tests (no Docker required)
uv run pytest -m "not integration"
# Run integration tests only (requires Docker)
uv run pytest -m integration
Current local suite: 3,000+ tests with an 90%+ branch coverage gate enforced in CI.
Tech Stack
| Technology | Version | Role |
|---|---|---|
| Python | 3.12+ | Runtime |
| uv | 0.11+ | Package management |
| Typer + Rich | 0.24+ | CLI interface |
| SQLAlchemy | 2.0+ | ORM |
| Pydantic | 2.12+ | Schema validation + domain models |
| Codex CLI | current | Local GPT-backed execution runtime |
| Claude Code CLI | current | Local Claude-backed execution runtime |
| cryptography | 46+ | Manifest signing, audit chain integrity |
| structlog | 25+ | Structured JSON logging |
| python-statemachine | 3.0+ | Workflow state transitions |
| pytest | 9.0+ | Testing (90%+ coverage) |
| ruff | 0.15+ | Linting + formatting |
| mypy | 1.20+ | Static type checking (strict mode with targeted relaxations — see [tool.mypy] in pyproject.toml) |
Documentation
| Document | Description |
|---|---|
| Product Requirements (PRD) | Complete specification (5,600+ lines) — the authoritative reference |
| Implementation Guide | Architectural guidance and build order |
| FreshCart Worked Example | End-to-end walkthrough using a sample project |
| Quick Reference Card | Classification tables, merge checklists, TTL rules |
| Security Audit | Security model and threat mitigations |
| Getting Started | Setup guide with step-by-step instructions |
| Operator Playbook | Builder-first vs expert workflow guidance and evidence/handoff patterns |
| Brownfield Guide | Applying CES to existing codebases |
| GNHF Trial Guide | Guardrails for using gnhf externally to develop CES |
| Troubleshooting | Common issues and solutions |
| Production Deployment | Production configuration and operational guidance |
Contributing
See CONTRIBUTING.md for development setup, testing, and workflow details.
If you want to evaluate external agent loops such as gnhf, use the
GNHF Trial Guide and
scripts/gnhf_trial.sh rather than treating them as
part of CES itself. Run them from a clean sibling worktree or clean clone, keep
the scope to contributor-side docs/tests/CLI polish, exclude manifest/policy,
approval/triage/review, audit, kill-switch, sandbox, and runtime-boundary
changes, review every generated branch manually before cherry-picking or
merging, and keep CES's own builder-first or expert workflows for actual
delivery work.
Changelog
See CHANGELOG.md for release history.
License
MIT
Built with the Agent-Native Software Delivery Operating Model v4
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file controlled_execution_system-0.1.3.tar.gz.
File metadata
- Download URL: controlled_execution_system-0.1.3.tar.gz
- Upload date:
- Size: 874.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a28443f75cbb15711579ba3b7de26e558628805910683611dcc7bfa6a78a034
|
|
| MD5 |
87d68295a8e8702c7e80e77966dafbf1
|
|
| BLAKE2b-256 |
1f609a8425c4d51f78f491de1a47978bbbac6683eda898de853632972fad6fa9
|
Provenance
The following attestation bundles were made for controlled_execution_system-0.1.3.tar.gz:
Publisher:
publish.yml on chrisduvillard/controlled-execution-system
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
controlled_execution_system-0.1.3.tar.gz -
Subject digest:
0a28443f75cbb15711579ba3b7de26e558628805910683611dcc7bfa6a78a034 - Sigstore transparency entry: 1365014567
- Sigstore integration time:
-
Permalink:
chrisduvillard/controlled-execution-system@9aa9871ca8c417cd6dd39081203e703eacc86366 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/chrisduvillard
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9aa9871ca8c417cd6dd39081203e703eacc86366 -
Trigger Event:
push
-
Statement type:
File details
Details for the file controlled_execution_system-0.1.3-py3-none-any.whl.
File metadata
- Download URL: controlled_execution_system-0.1.3-py3-none-any.whl
- Upload date:
- Size: 342.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f354238a2b7cb75d980c9f5451ef6c4d7711eaac1fb1c3662cdbedf4e456103b
|
|
| MD5 |
2bdefe01fa9c0d18dd9041f12d0a59ce
|
|
| BLAKE2b-256 |
cb12f103cfe441b2531b42eaab546e053f6304bc8b6489855c6820eb15e2af7c
|
Provenance
The following attestation bundles were made for controlled_execution_system-0.1.3-py3-none-any.whl:
Publisher:
publish.yml on chrisduvillard/controlled-execution-system
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
controlled_execution_system-0.1.3-py3-none-any.whl -
Subject digest:
f354238a2b7cb75d980c9f5451ef6c4d7711eaac1fb1c3662cdbedf4e456103b - Sigstore transparency entry: 1365014637
- Sigstore integration time:
-
Permalink:
chrisduvillard/controlled-execution-system@9aa9871ca8c417cd6dd39081203e703eacc86366 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/chrisduvillard
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9aa9871ca8c417cd6dd39081203e703eacc86366 -
Trigger Event:
push
-
Statement type: