Skip to main content

Simultaneous Production, Integration, and Control Environment: an agent harness for coding repositories.

Project description

spice

Simultaneous Production, Integration, and Control Environment.

spice is an installed agent harness: wrap, steer, supervise, coordinate, and audit coding agents across the repos they work on. Install it once, point it at a repository, and it provides a closed loop around the agents working there:

  • the agent's transcript is the single source of truth, and
  • the repo's filesystem is the single channel of steering;

supervision, coordination, conscience, and hygiene are derived mechanically from those two surfaces.

Why

You rarely know what you want until you watch it fail. Writing the spec up front doesn't change that — it commits the misunderstanding to a document you then have to maintain. And the channel you'd write it through, a human at a keyboard, has a bit rate that plateaued decades ago and is the slowest link in the loop.

spice takes the other route. The operator doesn't name the destination; they corral the agents toward it — watching what the agents emit and turning what they don't like into steering with the smallest possible gesture. The target is an evolving fixed point: the state the loop settles into when nothing it produces still provokes a correction. The spec is the output of that process, never its input.

Neither spec-driven nor observation-driven — both

Spec-driven development is waterfall for agents: it front-loads a written specification and assumes the implementer's speed is the bottleneck. The implementer's speed was never why waterfall failed — the discovery problem was. Cheap implementation doesn't remove discovery; it lets you generate precisely-wrong code at scale and maintain a precisely-wrong document beside it. All intent, no evidence.

Its mirror — observation-driven development, steering only by what the running system emits — fails the other way. With no target, you chase telemetry, fix whatever is in front of you, and never converge, because nothing is being converged toward. All evidence, no intent.

spice is the fusion. The transcript is one surface that is both: emitted behavior, and — the moment you quote-and-steer off it — declared intent. Observation supplies the truth and the gradient; the spec supplies the direction. The work lands at the fixed point where observed reality stops diverging from evolving intent. Intent kept honest by evidence; evidence kept on course by intent.

Is this for you

spice is opinionated on purpose, and the opinions are load-bearing: you can't drive a loop to a fixed point with minimal human input without encoding taste firmly enough for the machine to apply it for you. That makes spice a poor neutral tool and a sharp fit for one kind of operator.

It fits if you'd rather operate a fleet than hand-write code, you locate the craft in the structure rather than the keystrokes, you share its code posture — small bounded seams, ugly-fast cores allowed, no shims, fallbacks, or legacy — you trust listening over writing, and you run a supported agent driver (today: Codex).

It will fight you if your craft lives in the text, you want a tool that bends to your opinions instead of supplying its own, you're on a different agent, or you need something stable and supported today. That isn't a defect — it's the tool selecting its operator.

Install

pip install spice-harness    # or: uv tool install spice-harness
cd /path/to/your/repo
spice init               # hooks, spice.sh shim, state scaffolding
spice dev doctor         # verify drivers, backends, and policy

spice init writes machine-local git hook shims under .spice/ (ignored via .git/info/exclude) and a tracked spice.sh shim. Repo-tracked policy lives in your pyproject.toml under [tool.spice.*] tables. Entrypoint resolution is worktree-true: when the current repo is the spice source checkout, generated shims and supervisor children put that checkout first on PYTHONPATH and run python -m spice; ordinary target repos use the installed product. In a spice source checkout, ./spice.sh python … and ./spice.sh python3 … also resolve to the same checkout venv interpreter used by the harness itself. In ordinary target repos, those aliases use $VIRTUAL_ENV/bin/python when set, otherwise .venv/bin/python under the repo root; use an explicit interpreter path if a probe intentionally needs some other Python.

A project can set its default supervised-agent launch model and thinking in tracked config, either by editing pyproject.toml or by running spice config agent --scope project --model ... --thinking ...:

[tool.spice.agent]
model = "gpt-5.4"
thinking = "low"

An operator can override those defaults for just the current worktree:

spice config agent --scope worktree --model gpt-5.4 --thinking low

Resolution order is explicit launch flags, then worktree config, then tracked project config, then the driver defaults.

A repo can also mount its own tooling into the spice namespace:

[tool.spice.commands]
deploy = "./scripts/deploy.sh"
bench = ["python", "-m", "myproj.bench"]

spice deploy --env staging then runs the mounted command from the repo root with the remaining arguments passed through verbatim. Built-in verbs always win; a mount that shadows one fails loudly.

Mounted names are intentionally one-level verbs (^[a-z][a-z0-9-]*$), not nested command paths. A repo with a large tool family mounts one namespace owner and keeps family grouping inside that repo tool's own arguments:

[tool.spice.commands]
toolbox = ["uv", "run", "toolbox"]

spice toolbox lint css --fix then dispatches lint css --fix to toolbox. Do not encode families as dotted, spaced, or ad-hoc hyphenated spice mount names such as lint.css, lint css, or one mount per subcommand; those groupings belong behind the mounted repo tool's explicit contract.

Library seam for repo tools

Mounted commands and tracked pre-commit extensions may import a deliberately narrow Python seam from spice instead of vendoring harness scaffolding. This surface is source-stable for target repos: public names in the modules listed below are not removed or renamed silently, and incompatible changes require an explicit contract update. Underscored names remain private.

  • spice.errors: SpiceError for user-facing command failures.
  • spice.policy: constitution constants and flex_limit.
  • spice.flexstate: flex-limit sticky-state persistence and rename helpers.
  • spice.locking: cross-platform advisory file locks.
  • spice.paths: repo-root, state-dir, atomic write, and tool-resolution helpers.
  • spice.procs: process-group spawn, liveness, and termination helpers.
  • spice.repocfg: tracked [tool.spice] table readers.
  • spice.studies.walk: tracked/staged path walkers, repo policy exclusions, staged renames, and git blob reads.
  • spice.studies.fileloc, spice.studies.complexity, spice.studies.magicnums, and spice.studies.envpolicy: finding dataclasses plus scan_*, detect_*, and render_*_board helpers for project-specific studies.

Everything else is an internal implementation detail unless this section names it. A repo tool that needs an unlisted module should either vendor that helper or first add the helper to this seam with tests and a stability note.

The loop

Surface Command What it does
Wrapper spice agent run -- <cmd> (or ./spice.sh) Runs shell commands with proxy routing, git-shadow env, and steering injection on stderr.
Lifecycle spice agent ensure / supervise One worktree-bound agent per worktree, started under a neutral skill prompt, watched by a durable supervisor.
Steering filesystem inbox under .spice/inbox/ Durable operator messages; items retire only when the agent semantically ACKs their key in its transcript.
Tasks spice task … Phase-native Taskwarrior board shared by all worktrees; task next is allocator-owned; git sync happens at task boundaries.
Sessions spice session Transcript forensics: the no-arg briefing is the primary rehydration product, with context-pressure metering.
Interface spice serve Localhost web UI: lanes over worktrees, live transcript streams, lifetime control (Renew / Steer / Drive), task-filter routing, fused lane groups backed by server-side teams; spice serve teams and spice serve browser-artifact-path <file> expose operator diagnostics for smoke runs.
Conscience spice maxim … Builtin maxims judged against assistant prose by a local model; violations come back as inbox steering.
Constitution spice dev pre-commit / spice study … Namespace packages, path shape, LOC/byte/complexity flex+sticky gates, magic-number ratchet, env-literal inventory, commit-message policy.

Session analysis is intentionally tiered. The current tier includes spice session phases for contiguous working-phase spans and spice session messages for message-level side/phase/flavor filtering. Deeper report families that depend on richer topic/bucket modeling belong in a separate analytics tier after the basic phase/message surfaces harden.

Interface

spice serve is the operator interface for the loop. It can compose multiple agents into a single Drive lane, split worktrees into parallel lanes, route by task filter, show live transcript attachments, and expose the control surfaces needed to steer or audit a running session.

Compose and route Parallel lanes
Composed Drive lane with three agents Three Drive lanes across active worktrees
A composed Drive lane groups multiple worktree-bound agents behind one operator control surface. Separate lanes keep concurrent work readable while preserving per-agent Drive and speak controls.
Lane controls Steering and ACKs
Interface routing controls with filters, metrics, info, and assignment chips Live interface showing steering and ACK flow
Filters, metrics, info, and worktree assignment live in the lane header. Operator steering, ACKs, labels, and transcript controls stay visible in the live stream.
Attachments in transcript Live image evidence
Filters and attachment gallery Multi-lane interface with live image attachments
Transcript attachments remain browsable inside the lane. Screenshots, browser captures, and diagnostics stay part of the operating record.

The constitution

The pre-commit gate is the executable form of the project's opinions — see spice/policy.py. Highlights:

  • namespace packages only; no __init__.py under declared package roots;
  • file names match ^_*[0-9a-z]+_*$; splitting a file requires naming the seam (no *2.py, no generic continuation shards);
  • files flex to 1500 lines but a breach holds them to 1000 until they shrink; routines flex the same way around CCN 20 / length 80;
  • magic-number regressions are a ratchet against HEAD, not an amnesty;
  • env-literal inventory covers SPICE_* and CODEX_THREAD_ID by default; target repos can add tracked name regexes with [tool.spice.policy] env_name_patterns;
  • commit subjects fit in 100 columns; bodies are auto-folded.

The gate applies to spice itself: this repository is its own first target. Target repos can keep their own tracked gate lanes under the same hook by declaring mounted commands and pre-commit policy:

[tool.spice.commands]
fmt-cs = ["dotnet", "format"]

[tool.spice.policy]
pre_commit = [
    { label = "format C#", mount = "fmt-cs", formatter = true, when = ["*.cs"] },
    { label = "assets", run = ["python3", "-m", "tools.assets"], when = ["Assets/*"] },
]
pre_commit_success = [{ label = "clear asset sticky state", run = ["python3", "-m", "tools.assets", "--clear-sticky"] }]

[tool.spice.policy.pre_commit_builtins]
formatters = false
"magic-numbers" = { label = "project magic", run = ["python3", "-m", "tools.magic"] }

Built-in pre-commit keys are repo-shape, staging, repo-docs, formatters, local-paths, serve-web-typecheck, env-policy, file-shape, complexity, and magic-numbers (serve-web-typecheck no-ops in repos without the serve static sources it gates). They run before extension steps unless an individual built-in is disabled or replaced in tracked policy. pre_commit_success uses the same command shape as pre_commit, but runs only after the whole gate has passed, alongside sticky state cleanup.

Extension steps run from the repo root and receive the staged paths, newline-separated, in the SPICE_STAGED_PATHS environment variable. A step with when globs runs only when a staged path matches (fnmatch against the repo-relative path, * crosses directory separators) and receives just the matching paths; a step without when always runs and receives every staged path. Set formatter = true on a command step that rewrites matching staged files; after it exits successfully, the gate re-stages those same SPICE_STAGED_PATHS so the formatted content lands in the commit.

Release

Releases are cut from a clean main worktree with ./scripts/release.

./scripts/release prepare patch   # bump, validate, commit, stop before publish
./scripts/release notes > /tmp/spice-release-notes.md
./scripts/release publish --notes-file /tmp/spice-release-notes.md
./scripts/release patch           # one-pass bump, validate, commit, publish
./scripts/release minor           # same flow for a minor release

For curated GitHub release notes, generate the draft after prepare and edit from that file instead of relying on session memory. The draft is built from first-parent commits in the exact previous-release-tag-to-release-commit range, grouped by landed task project metadata, and records that range in the package notes.

Use a patch release when the shipped contract is unchanged: bug fixes, documentation clarifications, packaging fixes, or internal test/build/tooling changes that do not give operators a new capability and do not alter CLI, configuration, UI, task/session semantics, or the public library seam.

Use a minor release when users can do something new or observe changed behavior: new commands or flags, new configuration, new spice serve or task workflow behavior, additions to the public library seam, changed output or artifacts, or any compatibility break while the project only has patch/minor release lanes. If a release contains both patch-level fixes and minor-level surface changes, choose minor.

Status

Work in progress: an extraction-in-progress toward a standalone, releasable product. Surfaces are still settling; the loop described above is real and exercised daily, and this repository is its own first target. spice was extracted from daily use driving a real project, not designed top-down — the source of both its coherence and its opinions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spice_harness-0.2.1.tar.gz (342.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spice_harness-0.2.1-py3-none-any.whl (313.9 kB view details)

Uploaded Python 3

File details

Details for the file spice_harness-0.2.1.tar.gz.

File metadata

  • Download URL: spice_harness-0.2.1.tar.gz
  • Upload date:
  • Size: 342.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.1

File hashes

Hashes for spice_harness-0.2.1.tar.gz
Algorithm Hash digest
SHA256 af87b584c23633f3b91f72a0798e5eda67bd9b31222c95bcbb0060c08fa90164
MD5 600be797a7de2a15bb2deef00ee04662
BLAKE2b-256 608ce58a57b57ad376e718c008fba54ac01089bcc512f529ba6bec0a05d8d6d3

See more details on using hashes here.

File details

Details for the file spice_harness-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for spice_harness-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c088438b9dd59800014820c9b07f89fc60d85606f7898c394260fabce26eddbd
MD5 d98a81d1c54f5e5356213e652850cb5a
BLAKE2b-256 7756e622525df084ba28958360ea98498c1332ced3f9081196058354b9fed1cc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page