Deterministic governance controls for AI agent-driven software delivery

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ChrisDudu

These details have not been verified by PyPI

Project description

Controlled Execution System (CES)

Deterministic governance for AI agent-driven software delivery

Python 3.12+ Coverage 90%+ 3000+ Tests License MIT

What is CES?

AI agents can write code. But should you let them ship it without guardrails?

CES is a CLI tool that gives engineering teams (2-50 people) structured oversight of AI agents building their software. Instead of hoping agents produce correct code, CES provides deterministic controls that verify it — with trust that scales based on measured evidence, not faith.

Think of it as a governance layer between "an AI wrote this code" and "this code is in production." Every change gets classified by risk, reviewed by independent agents, and tracked in a tamper-proof audit ledger. Low-risk changes flow through automatically. High-risk changes get human review. Trust expands as agents prove themselves — and contracts when they don't.

The core principle: No autonomy expansion may rely solely on advisory controls. Every escalation in agent freedom must be backed by hard-enforced verification.

Default posture: CES is builder-first. Start with ces build, let CES set up local project state if needed, and only drop into expert commands when you need direct governance control.

How It Works

CES operates across three planes, each with a distinct role:

┌─────────────────────────────────────────────────────────────────────┐
│                                                                     │
│   CONTROL PLANE  (deterministic — no LLM calls)                     │
│   ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────────────┐   │
│   │ Manifest │  │  Policy  │  │ Workflow │  │   Audit Ledger   │   │
│   │ Manager  │  │  Engine  │  │  State   │  │ (append-only,    │   │
│   │          │  │          │  │ Machine  │  │  hash-chained)   │   │
│   └────┬─────┘  └────┬─────┘  └────┬─────┘  └──────────────────┘   │
│        │             │             │                                 │
│────────┼─────────────┼─────────────┼────────────────────────────────│
│        │             │             │                                 │
│   HARNESS PLANE  (orchestration + quality assurance)                │
│   ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────────────┐   │
│   │  Trust   │  │ Evidence │  │  Review  │  │   Classification │   │
│   │ Manager  │  │Synthesiz.│  │  Router  │  │   Engine+Oracle  │   │
│   └────┬─────┘  └────┬─────┘  └────┬─────┘  └──────────────────┘   │
│        │             │             │                                 │
│────────┼─────────────┼─────────────┼────────────────────────────────│
│        │             │             │                                 │
│   EXECUTION PLANE  (bounded agent work)                             │
│   ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────────────┐   │
│   │  Agent   │  │  Guide   │  │  Self-   │  │    Sensor        │   │
│   │  Runner  │  │Pack Build│  │Correction│  │  Orchestrator    │   │
│   └──────────┘  └──────────┘  └──────────┘  └──────────────────┘   │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Task Lifecycle

Every change follows this path:

  ┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐
  │  Create  │────▶│ Classify │────▶│ Execute  │────▶│  Review  │
  │ Manifest │     │ (risk +  │     │ (bounded │     │(independ.│
  │          │     │  class)  │     │  agent)  │     │  agents) │
  └──────────┘     └──────────┘     └──────────┘     └────┬─────┘
                                                          │
  ┌──────────┐     ┌──────────┐     ┌──────────┐         │
  │ Deployed │◀────│  Merged  │◀────│ Approved │◀────────┘
  │          │     │          │     │(human or │
  │          │     │          │     │ auto)    │
  └──────────┘     └──────────┘     └──────────┘

Tier C tasks (low risk): Can flow through fully autonomously
Tier B tasks (medium risk): Require hybrid human+agent review
Tier A tasks (high risk): Require full human approval, 3 different LLM models for review diversity

Key Concepts

Manifests

A manifest is the governance contract for a task. It defines what an agent is allowed to do: which files it can touch, which tools it can use, how many tokens it can spend, and when it expires. No manifest = no work.

Classification

Every manifest gets classified along three dimensions:

Dimension	What It Measures	Values
Risk Tier	Blast radius if something goes wrong	A (highest) → B → C (lowest)
Behavior Confidence	How predictable the output is	BC1 (deterministic) → BC2 → BC3 (subjective)
Change Class	Type of modification	Class 1 (new) → Class 5 (removal)

The aggregate classification determines the review workflow. The classifier must always be a different agent than the implementer.

Trust Tiers

Agent profiles earn trust over time:

  candidate ──────▶ trusted ◀──────▶ watch ◀──────▶ constrained
  (new agent)      (proven)        (slipping)       (restricted)

Trust moves based on measured performance: defect rates, escape rates, calibration probe results. Hidden checks (agents don't know they're being tested) prevent gaming.

Evidence Packets

Every task produces an evidence packet — a structured bundle of test results, sensor outputs, review findings, and the decision view. This is the proof trail. Evidence packets are immutable once assembled.

Audit Ledger

Every significant action is recorded in an append-only, hash-chained ledger. No updates, no deletes (enforced by database triggers). 14 event types from approvals to kill-switch activations. Tamper detection via HMAC chain.

Quick Start

Prerequisites

Python 3.12+
uv (package manager)
A supported local runtime: codex or claude

Just want to try it? See the 5-Minute Quickstart — no Docker, no Postgres, no API keys.

1. Clone and install

git clone https://github.com/chrisduvillard/controlled-execution-system.git
cd controlled-execution-system
uv sync

2. Verify a local runtime

ces doctor

3. Configure environment

cp .env.example .env
# Optional: set CES_AUDIT_HMAC_SECRET and CES_DEMO_MODE=1.
# Real local execution uses the installed `codex` or `claude` CLI.

4. Start with `ces build`

# Fresh repo or existing repo: `ces build` auto-creates `.ces/` on first run
# Builder-first path: describe what you want and let CES draft the contract
ces build "Add input validation to the user registration endpoint" --yes

# Resume the latest builder session without re-entering context
ces continue --yes

# Explain the latest builder state in plain language
ces explain
ces explain --view decisioning
ces explain --view brownfield

# Check the latest request, activity, and next step
ces status

# Export a concise builder run report for audit or handoff
ces report builder

If you prefer manual setup before the first build, CES still supports:

ces init my-project

For existing repos, CES auto-detects brownfield mode and asks what must be preserved before it runs the change. ces continue resumes the saved session stage instead of replaying the whole flow, and once a run is finished it points you back to ces build for the next request. You can force either path with --greenfield or --brownfield.

For day-to-day brownfield delivery, stay builder-first with ces build, ces continue, and ces explain --view brownfield. Use the expert brownfield commands only when you need an explicit legacy-behavior decision such as ces brownfield review OLB-<entry-id> --disposition preserve. The Brownfield Guide covers that handoff in more detail.

Use the builder-first flow when you want CES to carry the current request context for you. Switch to the expert workflow when you need explicit review, triage, approval, or audit/handoff artifacts. The Operator Playbook shows the boundary and recommended command sequences.

When you leave the single-request builder loop and need system-wide visibility or incident response, use the expert operations surfaces instead of relying on the builder-first ces status view:

ces status --expert
ces status --expert --watch
ces audit --limit 20
ces emergency declare "Security incident detected"

The Operations Runbook covers incident drills and recovery expectations for those commands.

If you want the explicit expert workflow instead, CES still supports:

ces manifest "Add input validation to the user registration endpoint" --yes
ces classify M-<manifest-id>

5. Use local expert commands directly

ces execute M-<manifest-id> --runtime auto
ces review M-<manifest-id>
ces triage M-<manifest-id>
ces approve M-<manifest-id> --yes

Try the FreshCart example

These demo commands are intended for a source checkout of CES after git clone and uv sync.

# Seed sample data
uv run python -m examples.freshcart.seed_data

# Run the end-to-end workflow
uv run python -m examples.freshcart.run_e2e

Use CES on CES

If you want CES to review changes to this repository itself, initialize repo-local state first:

ces init controlled-execution-system
ces dogfood --base origin/master

This creates a local .ces/ directory for repo-specific state. Keep that directory untracked; it is operational state, not project source.

CLI Commands

ces <command> [options]

Start Here

Command	Description
`build`	Builder-first local workflow: describe the change, gather only the missing context, run, review, and approve
`continue`	Resume the latest saved builder session without re-entering the same setup context
`explain`	Summarize the latest builder brief, evidence, blockers, and next step in plain language
`status`	Show builder-first project status; add `--expert` for the full expert view
`report builder`	Export the latest builder run report for audit or reviewer handoff
`init`	Optional manual setup before the first builder-first run

Advanced Governance

Command	Description
`manifest`	Generate a task manifest from a natural-language description
`classify`	Classify a manifest (risk tier, behavior confidence, change class)
`execute`	Execute an agent task within manifest boundaries
`review`	Run the review pipeline on completed work
`triage`	Pre-screen evidence with triage color (green/amber/red)
`approve`	Approve or reject an evidence packet
`gate`	Evaluate a phase gate (computational + agent checks)
`intake`	Run intake interview for a project phase
`calibrate`	Run hidden calibration probes against an agent
`audit`	Expert operations audit inspection; for example, `ces audit --limit 20`
`emergency declare`	Expert operations emergency declaration; for example, `ces emergency declare "Security incident detected"`
Command Groups
`vault ...`	Knowledge vault operations (Zettelkasten-style notes)
`brownfield ...`	Expert legacy behavior capture, review, and promotion

Configuration

Copy .env.example to .env and configure:

Variable	Description	Default
`CES_AUDIT_HMAC_SECRET`	HMAC secret for audit chain integrity	(change in managed environments)
`CES_LOG_LEVEL`	Logging level	`INFO`
`CES_LOG_FORMAT`	Log format (`json` or `text`)	`json`
`CES_DEFAULT_RUNTIME`	Default local runtime	`codex`
`CES_DEMO_MODE`	Use demo helper responses when no CLI-backed provider is available	`0`

For local runtime execution, CES relies on an installed codex or claude CLI. Any provider-specific credentials are handled by that runtime rather than through CES package extras.

Project Structure

controlled-execution-system/
├── src/ces/
│   ├── cli/               # Typer CLI
│   │   ├── __init__.py    #   App entry point
│   │   ├── run_cmd.py     #   Builder-first local workflow
│   │   ├── status_cmd.py  #   Status and explanation surfaces
│   │   └── *_cmd.py       #   Expert command modules
│   ├── control/           # Governance engine
│   │   ├── db/            #   SQLAlchemy tables and repositories
│   │   ├── models/        #   Governance domain models
│   │   └── services/      #   Manifest, policy, workflow, merge
│   ├── harness/           # Quality assurance
│   │   ├── models/        #   Harness-facing models
│   │   ├── sensors/       #   Computational sensors
│   │   └── services/      #   Trust, evidence, review, guide packs
│   ├── execution/         # Agent orchestration
│   │   ├── agent_runner.py    #   Agent runner
│   │   ├── providers/         #   LLM provider adapters
│   │   ├── runtimes/          #   Runtime registry + adapters
│   │   └── sandbox.py         #   Execution sandboxing
│   ├── intake/            # Intake interview flow
│   ├── knowledge/         # Vault services and ranking
│   ├── emergency/         # Kill switch
│   ├── brownfield/        # Legacy integration
│   ├── observability/     # Internal metrics and telemetry helpers
│   └── shared/            # Enums, crypto, config, logging
├── tests/
│   ├── unit/              # 157 test files
│   └── integration/       # End-to-end and regression integration tests
├── examples/              # FreshCart demo project
├── docs/                  # PRD, guides, reference cards
├── pyproject.toml         # Project config + dependencies
└── .env.example           # Environment template

Testing

CES maintains an 90%+ branch coverage gate enforced by CI.

# Run all unit tests
uv run pytest

# Run with coverage report
uv run pytest --cov=ces --cov-report=term-missing

# Skip integration tests (no Docker required)
uv run pytest -m "not integration"

# Run integration tests only (requires Docker)
uv run pytest -m integration

Current local suite: 3,000+ tests with an 90%+ branch coverage gate enforced in CI.

Tech Stack

Technology	Version	Role
Python	3.12+	Runtime
uv	0.11+	Package management
Typer + Rich	0.24+	CLI interface
SQLAlchemy	2.0+	ORM
Pydantic	2.12+	Schema validation + domain models
Codex CLI	current	Local GPT-backed execution runtime
Claude Code CLI	current	Local Claude-backed execution runtime
cryptography	46+	Manifest signing, audit chain integrity
structlog	25+	Structured JSON logging
python-statemachine	3.0+	Workflow state transitions
pytest	9.0+	Testing (90%+ coverage)
ruff	0.15+	Linting + formatting
mypy	1.20+	Static type checking (strict mode with targeted relaxations — see `[tool.mypy]` in pyproject.toml)

Documentation

Document	Description
Product Requirements (PRD)	Complete specification (5,600+ lines) — the authoritative reference
Implementation Guide	Architectural guidance and build order
FreshCart Worked Example	End-to-end walkthrough using a sample project
Quick Reference Card	Classification tables, merge checklists, TTL rules
Security Audit	Security model and threat mitigations
Getting Started	Setup guide with step-by-step instructions
Operator Playbook	Builder-first vs expert workflow guidance and evidence/handoff patterns
Brownfield Guide	Applying CES to existing codebases
GNHF Trial Guide	Guardrails for using `gnhf` externally to develop CES
Troubleshooting	Common issues and solutions
Production Deployment	Production configuration and operational guidance

Contributing

See CONTRIBUTING.md for development setup, testing, and workflow details. If you want to evaluate external agent loops such as gnhf, use the GNHF Trial Guide and scripts/gnhf_trial.sh rather than treating them as part of CES itself. Run them from a clean sibling worktree or clean clone, keep the scope to contributor-side docs/tests/CLI polish, exclude manifest/policy, approval/triage/review, audit, kill-switch, sandbox, and runtime-boundary changes, review every generated branch manually before cherry-picking or merging, and keep CES's own builder-first or expert workflows for actual delivery work.

Changelog

See CHANGELOG.md for release history.

License

MIT

_{Built with the Agent-Native Software Delivery Operating Model v4}

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ChrisDudu

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.3

Apr 23, 2026

0.1.2

Apr 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

controlled_execution_system-0.1.3.tar.gz (874.9 kB view details)

Uploaded Apr 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

controlled_execution_system-0.1.3-py3-none-any.whl (342.8 kB view details)

Uploaded Apr 23, 2026 Python 3

File details

Details for the file controlled_execution_system-0.1.3.tar.gz.

File metadata

Download URL: controlled_execution_system-0.1.3.tar.gz
Upload date: Apr 23, 2026
Size: 874.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for controlled_execution_system-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`0a28443f75cbb15711579ba3b7de26e558628805910683611dcc7bfa6a78a034`
MD5	`87d68295a8e8702c7e80e77966dafbf1`
BLAKE2b-256	`1f609a8425c4d51f78f491de1a47978bbbac6683eda898de853632972fad6fa9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for controlled_execution_system-0.1.3.tar.gz:

Publisher: publish.yml on chrisduvillard/controlled-execution-system

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: controlled_execution_system-0.1.3.tar.gz
- Subject digest: 0a28443f75cbb15711579ba3b7de26e558628805910683611dcc7bfa6a78a034
- Sigstore transparency entry: 1365014567
- Sigstore integration time: Apr 23, 2026
Source repository:
- Permalink: chrisduvillard/controlled-execution-system@9aa9871ca8c417cd6dd39081203e703eacc86366
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/chrisduvillard
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@9aa9871ca8c417cd6dd39081203e703eacc86366
- Trigger Event: push

File details

Details for the file controlled_execution_system-0.1.3-py3-none-any.whl.

File metadata

Download URL: controlled_execution_system-0.1.3-py3-none-any.whl
Upload date: Apr 23, 2026
Size: 342.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for controlled_execution_system-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f354238a2b7cb75d980c9f5451ef6c4d7711eaac1fb1c3662cdbedf4e456103b`
MD5	`2bdefe01fa9c0d18dd9041f12d0a59ce`
BLAKE2b-256	`cb12f103cfe441b2531b42eaab546e053f6304bc8b6489855c6820eb15e2af7c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for controlled_execution_system-0.1.3-py3-none-any.whl:

Publisher: publish.yml on chrisduvillard/controlled-execution-system

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: controlled_execution_system-0.1.3-py3-none-any.whl
- Subject digest: f354238a2b7cb75d980c9f5451ef6c4d7711eaac1fb1c3662cdbedf4e456103b
- Sigstore transparency entry: 1365014637
- Sigstore integration time: Apr 23, 2026
Source repository:
- Permalink: chrisduvillard/controlled-execution-system@9aa9871ca8c417cd6dd39081203e703eacc86366
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/chrisduvillard
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@9aa9871ca8c417cd6dd39081203e703eacc86366
- Trigger Event: push

controlled-execution-system 0.1.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Controlled Execution System (CES)

What is CES?

How It Works

Task Lifecycle

Key Concepts

Manifests

Classification

Trust Tiers

Evidence Packets

Audit Ledger

Quick Start

Prerequisites

1. Clone and install

2. Verify a local runtime

3. Configure environment

4. Start with ces build

5. Use local expert commands directly

Try the FreshCart example

Use CES on CES

CLI Commands

Start Here

Advanced Governance

Configuration

Project Structure

Testing

Tech Stack

Documentation

Contributing

Changelog

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

4. Start with `ces build`