Gunicorn-style agent swarms with Redis-like shared memory and artifact stitching. (Imported as `puppetmaster`; published as `puppetmaster-ai` because the bare PyPI name is held by an abandoned 2019 project, name-reassignment pending.)
Project description
Puppetmaster
Puppetmaster turns Cursor (or Claude Code, or the OpenAI API, or the official OpenAI Codex CLI) into an orchestrator that routes each task to the cheapest model that can handle it, stores worker outputs as typed SQLite artifacts so follow-ups cost zero tokens, and coordinates workers through durable state instead of a shared parent transcript. Four production adapters live; eleven starter tiers in the model registry.
Live OpenAI A/B with real billing tokens — same prompt, equivalent answer, one of 3 back-to-back runs:
Pinnedgpt-5.5: $0.006900 in 5.480 s (156 tokens in / 204 tokens out)
Puppetmaster routed togpt-5.4-nano: $0.000132 in 1.511 s (156 tokens in / 121 tokens out)
→ 98.1% cheaper, 72.4% faster. Cost ratio was 98.1–98.7% across the 3 consecutive runs (wall-time variance was wider, 68–88%, because pinned-frontier latency varies more than nano latency). Reproduce withOPENAI_API_KEY=... python -m bench.router_live_ab.
Three claims, three receipts
Every number in this section comes from a reproducible script in bench/. What is not defensible today (and what we won't claim) lives in TALKING_POINTS.md.
1. Token cost — fixed on two axes
On new work — Puppetmaster's v0.6.0 router classifies each task's complexity (role + instruction signal patterns + payload size) and picks the cheapest model from your user-owned registry that can handle it. Every routing decision is stored as an auditable ROUTING artifact: picked model, capability needed, estimated cost, and the full list of rejected alternatives with the reason each was rejected.
- Receipt:
bench/router_savings.py— on a 6-task fixture, the router was 35.1% cheaper than pinning a frontier model. Two of six hard tasks (audit, architect) correctly stayed on the frontier model — the wins come from not using a frontier model when the task doesn't need one. - Receipt:
bench/router_live_ab.py— live OpenAI A/B with realusage.prompt_tokens(not estimates): 98.1–98.7% cheaper across 3 consecutive runs on a single explore task; wall-time savings between 68% and 88% per run.
On follow-up work — once a swarm completes, every artifact (finding, decision, risk, patch, verification, routing decision) lives in SQLite. Follow-up questions like "what did the security audit worker find?" are SQLite queries, not new agent runs.
- Receipt:
bench/followup_cost.py— 40 follow-up queries against a real completed swarm: 0 adapter calls, 0 tokens, $0.00, avg 0.5 ms per query. Hypothetical "always-frontier replay" baseline for the same 40 queries: $1.64 (using Anthropic's current Opus 4.7 rate of $5/$25 per MTok, corrected in v0.6.3 from the $15/$75 the registry shipped before).
Honest scope: this is the follow-up reads are free claim. If your follow-up needs new reasoning the swarm didn't produce, that's a new task and it costs tokens like any other.
2. Transcript — workers don't share one
The classic multi-subagent shape stuffs everything into one parent chat. Each subagent inherits the parent's stale context, results come back as prose, and the context window bloats until the important details are buried. This is the failure mode users actually hit on long investigations.
Puppetmaster does the opposite. Workers don't see each other's transcripts. They claim tasks by lease, emit typed artifacts with payloads + evidence + confidence + sha256 integrity, and the final stitcher reads JSON — not raw worker stdout. The parent agent's context only sees what the stitcher publishes.
- Inspect a live swarm:
puppetmaster artifacts <job_id>returns the durable state — the actual coordination surface, not a chat scrollback. - Inspect a completed swarm without paying tokens: same command, milliseconds, $0 (see receipt #1 above).
- Verify nothing is hand-waved: every artifact carries
created_by(which worker),created_at, and a contentsha256.
3. Graphing — credit CodeGraph, wire it in cleanly
The "graph your directories for cheap symbol context" capability is not a Puppetmaster feature. It's CodeGraph — a separate project — and it deserves the credit. Puppetmaster's contribution is what happens after CodeGraph is installed:
- Every Cursor / Claude / OpenAI / Codex worker auto-injects task-relevant CodeGraph context into its prompt before the model call — no MCP round-trip per worker.
- One shared
codegraph contextquery seeds N parallel workers in a swarm (vs N separate queries if each agent issues its own). - The resulting artifacts (which now reference symbol-level evidence from CodeGraph) land in the same durable store, so follow-ups still cost zero tokens.
- The most-used CodeGraph CLI verbs are bundled directly into Puppetmaster's MCP — see Bundled CodeGraph tools — so Cursor Agent only needs one MCP for both orchestration and symbol intelligence.
Puppetmaster works fine without CodeGraph. Workers fall back to grep/read for context discovery, and orchestration / artifacts / durable state are unchanged. CodeGraph is opt-in via npm install -g @colbymchenry/codegraph && codegraph init && codegraph index.
Think Redis/Gunicorn for agentic engineering:
Cursor Agent / Claude Code / OpenAI / Codex CLI / shell
|
v
Puppetmaster supervisor ──> task-aware model router (11 starter tiers)
|
v
independent worker processes ──> SQLite (typed artifacts, events, memory)
|
v
live artifact board ──> stitched summary ──> 0-token follow-up reads
Puppetmaster is not trying to beat native IDE subagents at every tiny task. It is for the work that gets messy: long repo investigations, conflicting hypotheses, repeated handoffs, flaky memory, and code changes that need evidence, replay, and approval gates.
90-second quickstart
git clone https://github.com/professorpalmer/Puppetmaster.git
cd Puppetmaster && python -m pip install -e . && npm install --package-lock=false --no-audit
python -m puppetmaster doctor # 14 health checks (python, sqlite, git, node, npm, cursor-sdk, claude-code, codex, codegraph, mcp-servers, two API keys, sqlite-state, git-status)
python -m puppetmaster models init # writes the 11-tier starter registry across cursor, claude-code, openai, and codex adapters
python -m puppetmaster install-cursor-mcp # wires Puppetmaster into Cursor (workspace .cursor/mcp.json); --global writes ~/.cursor/mcp.json
python -m puppetmaster install-codex-mcp # wires Puppetmaster into the Codex CLI (codex mcp add ...)
python -m puppetmaster route "Format these files" --role verify-runtime
# dry-run routing decision: picks cursor/composer-2-5 ($0)
OPENAI_API_KEY=... python -m bench.router_live_ab
# ~$0.01 of real spend, prints the ~98%-cheaper receipt
Both install-*-mcp commands resolve sys.executable (avoids the "wrong python on PATH" failure mode), launch a tools/list handshake before writing anything, are fully idempotent (re-run = unchanged), and preserve any existing env vars / unrelated MCP servers already in the file.
For deeper proof, TALKING_POINTS.md has the full truth-table separating "use this phrasing" from "avoid that overclaim".
The Problem
Most multi-agent coding workflows still use a fragile shape:
One parent chat
|- subagent
|- subagent
`- subagent
That works for demos. It breaks down during real repo work.
- The parent context bloats until the important details are buried.
- Subagents inherit stale assumptions from the same conversation.
- Results come back as prose blobs instead of evidence-backed records.
- There is no durable state, replay, lease, failure recovery, or memory promotion.
- A crashed or confused worker often becomes a mystery instead of an inspectable event.
- Full-edit agents can mix old local changes with new changes unless the workflow guards against it.
Puppetmaster is built around a different rule:
Agents should not share transcript history. They should share durable state.
What Puppetmaster Solves
1. Context Collapse
Workers do not coordinate by stuffing every thought into one parent conversation. They claim tasks, write structured artifacts, and let the stitcher summarize durable outputs back to the operator.
2. Subagent Resource Contention
Puppetmaster does not rely on one parent agent spawning children inside the same chat surface. It runs workers as separate local subprocesses, each with its own adapter invocation and lifecycle.
3. Vibe-Based Handoffs
Workers emit typed artifacts with payloads, evidence, confidence, source files, and sha256 integrity. The final synthesis reads artifacts, not raw worker transcripts.
Artifacts are available as soon as they are emitted. The final stitch is the publishable synthesis, not the first moment the work becomes visible.
4. Lost Work and Dead Workers
Tasks are lease-based. Stale workers can be recovered. Jobs fail closed. Failures become events and verification artifacts instead of disappearing into chat history.
5. Unsafe Code Edits
Claude Code full-edit runs are blocked on dirty worktrees by default. When edits happen, Puppetmaster captures patch artifacts with changed files, base SHA, unified diff, and revert guidance.
6. No Long-Term Recall
Useful artifacts can be promoted into memory and retrieved by later workers. The next run does not need the entire old conversation to remember what mattered.
What It Is
Puppetmaster is not another group-chat swarm. It is a local coordination runtime:
Job: one user goalTask: role-specific work, optionally dependency-gatedWorker: separate subprocess that claims work through a leaseAdapter: Cursor SDK, Claude Code CLI, shell, or future providerArtifact: structured finding, decision, patch, verification result, risk, or memory summaryStitcher: final synthesis from artifacts onlyMemory: promoted facts for future retrieval
SQLite is the default coordination backend. WAL mode, schema metadata, integrity checks, task leases, retries, event streams, and patch artifacts are built in.
Why Not Just Use Subagents?
Native IDE subagents are great for quick parallel help inside one product surface. Puppetmaster solves a different problem: making agent work durable and inspectable outside a single parent context.
| Native subagents | Puppetmaster |
|---|---|
| Fast for small tasks | Better for long, stateful investigations |
| Shared chat surface | Shared durable state |
| Transcript-heavy handoffs | Typed artifacts with evidence |
| Harder to replay | Jobs, events, artifacts, and summaries persist locally |
| Usually opaque failure model | Leases, recovery, logs, and failed-task artifacts |
| Final answer often hides process | Live artifact board while workers run |
The goal is not “one more chat.” The goal is a local runtime where the operator can start a swarm, get a job_id, watch artifacts appear, inspect partial summaries, and only then approve edits.
What Works Today
| Area | Status |
|---|---|
| Local runtime | Daily-driver beta: subprocess workers, task DAGs, leases, recovery, failure states |
| SQLite backend | Default backend with WAL mode, schema metadata, integrity checks, and persisted events |
| Model router (v0.6.0+) | Task-aware routing across the registry; every decision is an auditable ROUTING artifact. Receipts: bench/router_savings.py, bench/router_live_ab.py |
| Cursor Agent MCP | Async start tools, status polling, logs, live artifacts, partial summaries, routing tools (puppetmaster_route_task, puppetmaster_list_models, puppetmaster_job_cost) |
| Cursor extension | Activity-bar control panel for running Puppetmaster inside Cursor |
| Cursor adapter | Live adapter through @cursor/sdk; best for review/plan/dry-run workflows |
| Claude Code adapter | Live full-edit adapter through Claude Code CLI; validated with real tracked diffs |
| OpenAI adapter (v0.6.1-beta.1+) | Direct Chat Completions via OPENAI_API_KEY; captures real tokens_in / tokens_out from usage.prompt_tokens. |
| Codex adapter (v0.7.0+) | Live full-edit adapter through the official OpenAI Codex CLI (codex exec --json). Parses the structured JSONL event stream and captures input_tokens, output_tokens, cached_input_tokens, reasoning_output_tokens, and thread_id per turn — currently the most telemetry-rich adapter Puppetmaster ships. |
| Shell adapter | Built-in bounded command runner for verification |
| Memory | Promoted memory retrieval into later worker context and prompts |
| CodeGraph | Optional shared repo intelligence: workers auto-inject CodeGraph context when available |
| Patch workflow | Patch artifacts, path locks, approval/rejection events, dirty-worktree guard |
| Reproducible benchmarks | Six harnesses in bench/, each with markdown + JSON receipts under bench/results/ |
Install
git clone https://github.com/professorpalmer/Puppetmaster.git
cd Puppetmaster
python -m pip install -e .
npm install --package-lock=false --no-audit
python -m puppetmaster doctor
Run the local demo:
python -m puppetmaster run "Map this repo" --config examples/enterprise-workflow.json
python -m puppetmaster show $(python -m puppetmaster last)
Prove worker recovery:
python -m puppetmaster crash-demo
Daily Driver Prompts
In Cursor Agent, with MCP enabled:
Use Puppetmaster to run doctor in this repo and summarize what is missing.
Use Puppetmaster to start a swarm for this repo and return the job id immediately.
Problem: users are getting logged out after refresh and token refresh tests are flaky.
Constraints: keep the patch focused, preserve public API behavior, and run relevant tests.
Do review/plan first. Poll status/logs by job id. Do not edit until you summarize findings and ask for approval.
For real multi-role analysis, prefer puppetmaster_start_cursor_swarm through Cursor Agent. It creates real Cursor SDK-backed worker roles. Bare custom roles on the generic puppetmaster_start_swarm require a config or adapter so Puppetmaster does not accidentally run deterministic demo workers.
While the job runs, ask Cursor Agent to inspect:
Poll Puppetmaster status, live artifacts, and partial summary for <job_id>. Summarize concrete findings as they arrive.
After review/approval:
Use Puppetmaster to start Claude Code implementation for the approved fix in a clean worktree. Return the job id immediately and poll status until complete.
From the CLI:
python -m puppetmaster doctor
python -m puppetmaster cursor "Review this repo for release blockers" --review --dry-run
python -m puppetmaster cursor "Plan the next safe implementation slice" --plan --dry-run
python -m puppetmaster claude "Implement the approved change and run focused tests" --permission-mode acceptEdits
python -m puppetmaster show $(python -m puppetmaster last)
python -m puppetmaster logs
cursor and claude use inline orchestration by default to avoid an extra Python worker cold start. The provider still runs in its own process (node for Cursor SDK, Claude Code CLI for Claude), while Puppetmaster keeps the same job/task/artifact/lease state model. Use --worker-mode subprocess when you want the stricter worker-process boundary for a run.
For local swarms, you can keep Puppetmaster workers warm and let jobs hand off work to them:
python -m puppetmaster daemon --roles explore architect implement redteam test
python -m puppetmaster run "Review this repo" --worker-mode daemon
Daemon mode keeps the Puppetmaster worker loop alive across jobs. It preserves lease-based task claiming and artifacts, while avoiding repeated worker process startup for local-role swarms.
For real edits, prefer a clean worktree:
git worktree add /tmp/puppetmaster-work -b puppetmaster-work
python -m puppetmaster claude "Implement the approved fix" --cwd /tmp/puppetmaster-work --permission-mode acceptEdits
Intelligent model orchestration (new in v0.6.0)
Puppetmaster ships a task-aware model router that picks the right LLM for each task instead of pinning one model per adapter. Cheap models handle trivial work, capable models handle hard work, vision tasks land on a vision-capable model, and you see exactly why.
The router is built around three pillars:
- A user-owned registry. You describe your own models, prices, and asserted capability scores in
~/.puppetmaster/models.json(override with$PUPPETMASTER_MODELS_PATH). No hardcoded model names, no live price fetching — your subscriptions, your numbers. - A transparent classifier. Pure-function heuristic that assigns a 0..100 capability-needed score from the task's role + instruction + payload (e.g.
verify-runtime≈ 25,explore≈ 50,implement≈ 75,audit/security-review≈ 90+). Vision tasks auto-add avisionrequired-tag so non-vision models are filtered out cleanly. Override per-task withpayload.min_capability. - Four policies.
balanced(default — cheapest sufficient, ties broken toward right-sized smaller models),cheap,quality,escalating(ordered chain for retries). Override per-task withpayload.routing_policy.
Every routing decision is a durable artifact. Picked model, classifier output, estimated USD cost, and the full list of rejected alternatives with the reason each was rejected — all stored as an ArtifactType.ROUTING artifact tied to the task. Run puppetmaster artifacts <job_id> to see why each task went where, or puppetmaster cost <job_id> to sum spend across the run.
Where it kicks in automatically (and where it doesn't)
This is the part to be honest about:
| Surface | Auto-routes? |
|---|---|
puppetmaster_start_cursor_swarm (MCP) |
YES — default workers ship with auto_route: true. |
puppetmaster_start_swarm (MCP) |
YES — same default workers. |
puppetmaster_start_claude_implement (MCP) |
Opt-in per call — pass a spec with auto_route: true or accept the default. |
python -m puppetmaster run |
YES for built-in workers; opt-in per spec in a custom config. |
Cursor's main chat window (typing @cursor) |
NO. Cursor's own model picker chooses the chat model — Puppetmaster is not in that loop. The router applies when Puppetmaster runs a swarm, not when Cursor's agent is having a conversation with you. |
| Claude Code's main session | NO — same reason. Claude Code picks its own session model. |
In other words: the router governs how Puppetmaster fans work out across its swarm workers; it does not (and cannot) hijack the model your IDE's primary chat agent uses. If you want the cheap-tier model for trivial chat work, set that as your IDE's default in Cursor settings. The router is for every task Puppetmaster delegates, which on a real workflow is far more model invocations than the chat itself.
If you haven't run puppetmaster models init yet, auto-routing is a clean no-op: the orchestrator emits one router.registry_empty event per run, then falls back to each spec's declared adapter. Nothing breaks.
The four tiers in the starter registry
puppetmaster models init writes 11 tiered model entries that map directly to the "easy / balanced / high / extra-high" mental model — 5 Cursor/Claude tiers, 4 OpenAI tiers, and 2 Codex tiers, covering every cheap → frontier pairing across all four production adapters. The adapter_model_name values are the literal strings each adapter passes through to its SDK / CLI today (verified against Cursor's runtime catalog, Anthropic's claude CLI, and OpenAI's codex CLI as of v0.7.0): composer-2.5, gpt-5.5, claude-haiku-4-5, claude-opus-4-6, claude-opus-4-7 for the Cursor/Claude tier; gpt-5.5 / gpt-5.4 / gpt-5.4-mini / gpt-5.4-nano for the OpenAI tier; gpt-5.5 / gpt-5.4-mini (routed through codex exec --json) for the Codex tier. When newer versions land, edit adapter_model_name in ~/.puppetmaster/models.json and the tier ids stay stable:
| Tier ID | Adapter | Mental model | Tags |
|---|---|---|---|
cursor/composer-2-5 |
cursor |
fast / cheap / reading ($0 — bundled in Cursor plan) | cheap, fast, reading, code |
cursor/gpt-5-5 |
cursor |
balanced — $0 via Cursor plan, GPT-5.5 quality | balanced, fast, vision |
claude-code/haiku-4-5 |
claude-code |
cheap on the Anthropic side ($1 / $5) — the cheap tier for Claude-Code-only users | cheap, fast, vision, reading, code |
claude-code/opus-4-6 |
claude-code |
high-quality — $5 / $25 per MTok | quality, vision, reasoning |
claude-code/opus-4-7 |
claude-code |
frontier — $5 / $25, best for hard reasoning + detailed vision | frontier, vision, detailed-vision, reasoning |
openai/gpt-5-5 |
openai |
frontier via Responses API — $5 / $30 per MTok | frontier, vision, detailed-vision, reasoning, code |
openai/gpt-5-4 |
openai |
workhorse — $2.50 / $15 per MTok | quality, fast, vision, code, reasoning |
openai/gpt-5-4-mini |
openai |
balanced — $0.75 / $4.50 per MTok | balanced, fast, vision, code |
openai/gpt-5-4-nano |
openai |
cheap reader — $0.15 / $0.90 per MTok | cheap, fast, reading |
codex/gpt-5-5 |
codex |
frontier with the Codex agent loop (file edits, shell, search) — $5 / $30 per MTok | frontier, vision, reasoning, code, agent-loop |
codex/gpt-5-4-mini |
codex |
balanced with the Codex agent loop — $0.75 / $4.50 per MTok | balanced, vision, code, agent-loop |
With the starter registry, balanced-policy routing lands roughly:
| Task | Picked model |
|---|---|
format these files |
cursor/composer-2-5 |
map the auth module |
cursor/composer-2-5 |
add password reset endpoint |
cursor/gpt-5-5 |
decision: which caching strategy fits |
claude-code/opus-4-6 |
security audit every endpoint |
claude-code/opus-4-7 |
describe what you see in the screenshot |
cursor/gpt-5-5 (vision-tagged) |
OCR every detail of the diagram |
claude-code/opus-4-7 (detailed-vision) |
refactor every callsite of foo() and add tests |
openai/gpt-5-4 (workhorse — cheaper than frontier, capable enough for cross-file refactor) |
Quick start
# 1. Write the starter registry (5 Cursor/Claude tiers + 4 OpenAI tiers + 2 Codex tiers = 11)
python -m puppetmaster models init
# 2. Inspect the registry
python -m puppetmaster models list
# 3. Dry-run a routing decision before kicking off a swarm
python -m puppetmaster route "Security audit across every endpoint" --role audit
# picked: claude-code/opus-4-7 (adapter=claude-code, model_name=claude-opus-4-7)
# policy: balanced
# capability needed: 98 chosen capability: 98
# estimated tokens: in=510 out=5000 estimated cost: $0.127550
# why: policy=balanced: cheapest model whose capability_score (98) >= needed (98)
# rejected:
# - cursor/composer-2-5: capability_score 55 < needed 98
# - cursor/gpt-5-5: capability_score 78 < needed 98
# - claude-code/haiku-4-5:capability_score 55 < needed 98
# - claude-code/opus-4-6: capability_score 88 < needed 98
# - openai/gpt-5-5: capability_score 96 < needed 98
# - openai/gpt-5-4: capability_score 86 < needed 98
# - openai/gpt-5-4-mini: capability_score 70 < needed 98
# - openai/gpt-5-4-nano: capability_score 52 < needed 98
python -m puppetmaster route "Format these files" --role verify-runtime
# picked: cursor/composer-2-5 (adapter=cursor, model_name=composer-2.5)
# capability needed: 20 chosen capability: 55
# estimated cost: $0.000000 (Cursor-tier models bill through your Cursor plan)
# rejected:
# - cursor/gpt-5-5: sufficient capability but pricier than cursor/composer-2-5
# - claude-code/opus-4-6: sufficient capability but pricier than cursor/composer-2-5
# - ... (5 more, all rejected for being pricier)
Wiring auto-routing into a swarm
Set payload.auto_route = true on any worker spec. The orchestrator replaces the spec's adapter and stamps payload.model from the router's decision before the task runs, and persists a ROUTING artifact:
from puppetmaster.workers import WorkerSpec
specs = [
WorkerSpec(
role="explore",
instruction="Map the auth subsystem",
payload={"auto_route": True},
),
WorkerSpec(
role="audit",
instruction="Find auth bypasses in every endpoint",
payload={"auto_route": True, "routing_policy": "quality"},
),
WorkerSpec(
role="verify-runtime",
instruction="Run pytest and report results",
payload={"auto_route": True, "max_cost_usd": 0.01},
),
]
After the run:
python -m puppetmaster artifacts <job_id> | jq '.[] | select(.type=="routing") | .payload'
# {
# "model_id": "claude-code/opus-4-7",
# "adapter": "claude-code",
# "adapter_model_name": "claude-opus-4-7",
# "policy": "balanced",
# "capability_needed": 98,
# "capability_score": 98,
# "estimated_cost_usd": 0.127550,
# "reason": "policy=balanced: cheapest model whose capability_score (98) >= needed (98)",
# "rejected": [
# {"id": "cursor/composer-2-5", "reason": "capability_score 55 < needed 98"},
# {"id": "cursor/gpt-5-5", "reason": "capability_score 78 < needed 98"},
# {"id": "claude-code/opus-4-6", "reason": "capability_score 88 < needed 98"},
# {"id": "openai/gpt-5-5", "reason": "capability_score 96 < needed 98"}
# ]
# }
Per-task overrides
| Override | Effect |
|---|---|
payload.min_capability (int) |
Force classifier output to this value (0..100). |
payload.max_cost_usd (float) |
Hard cap on estimated per-call USD cost. Models over budget are excluded with a clear rejection reason. |
payload.required_tags (list) |
Only consider models whose tags include ALL of these. |
payload.routing_policy (str) |
One of balanced (default), cheap, quality, escalating. |
payload.registry_path (str) |
Use a different registry file for this task. |
Scope and honesty
Four production adapters ship today: cursor (Cursor SDK via @cursor/sdk), claude-code (Anthropic via the claude CLI), openai (direct Chat Completions via OPENAI_API_KEY, added in v0.6.1-beta.1), and codex (official OpenAI Codex CLI via codex exec --json, added in v0.7.0). Together they cover the entire starter registry: composer-2.5, gpt-5.5, claude-haiku-4-5, claude-opus-4-6, claude-opus-4-7, the four GPT-5 OpenAI tiers (gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.4-nano), and the two Codex tiers (codex/gpt-5-5, codex/gpt-5-4-mini). Raw HTTP adapters for additional providers (Gemini, DeepSeek, Kimi) are not yet in. They slot in cleanly as new adapter values — the registry + router/classifier framework doesn't need to change — but each one needs real validation against its provider's API before it ships.
Capability scores and prices stay user-asserted. Puppetmaster makes the decision transparent (full audit trail of why each task went where); it does not make the value judgments for you (whether GPT-5.4 really is an 86, or whether Cursor's bundled models should be treated as $0). Edit the registry to match your reality.
MCP
Two new MCP tools for agent-side use:
| MCP tool | What it does |
|---|---|
puppetmaster_route_task |
Dry-run the router on an instruction. Returns the picked model + cost + rejected alternatives. |
puppetmaster_list_models |
Print the registry as JSON (path + each model spec). |
Works great with CodeGraph (optional)
Puppetmaster runs fine without CodeGraph — workers will fall back to grep/read for context discovery, and the orchestration / durable state / parallel-worker machinery is unchanged. When you pair it with CodeGraph, every Cursor/Claude worker gets a pre-built repo map (symbols, refs, call graph) injected into its prompt instead of having to rediscover the codebase. The two tools optimize different axes and stack cleanly:
- CodeGraph = per-call context resolution. Static facts about your code (symbols, refs, routes).
- Puppetmaster = per-session coordination + state. Dynamic facts about the agents' work (tasks, leases, typed artifacts, replayable events).
Install CodeGraph globally and initialize it once per target repo:
npm install -g @colbymchenry/codegraph
cd your-target-repo
codegraph init -i
After that, Puppetmaster's doctor will show codegraph ok, and every Cursor/Claude worker run against that workspace will:
- query CodeGraph for task-relevant symbols, files, and routes
- inject the result into the worker prompt as authoritative starting context
- tag the resulting verification artifact with
context:codegraphso the operator can confirm shared intelligence was used
Fully optional and graceful. If codegraph is not installed, or the target repo is not initialized, workers fall back to their normal exploration path with no error. Pass disable_codegraph: true in a task payload to skip CodeGraph for a specific worker.
Cursor Agent can also query CodeGraph directly through Puppetmaster's MCP — no second MCP server required for the daily-driver case. See Bundled CodeGraph tools below.
Cost: what changes when you switch to durable state
Newer, more direct receipts for the routing + durable-state claims live in bench/router_savings.py, bench/router_live_ab.py, and bench/followup_cost.py — see the opening section and TALKING_POINTS.md. The Agent / CodeGraph / Puppetmaster three-way analysis below is older and broader (it models multi-worker swarm cost vs single-agent cost); both views are valid and they answer different questions.
The whole point of Puppetmaster is that durable state turns repeated questions about the same task into a database read instead of another agent run. The benchmark below shows that effect against two baselines:
- A. Agent only — one agent (Cursor or Claude Code) doing the work alone, discovering the repo with grep/read/list. No shared state across sessions.
- B. CodeGraph alone — same agent, with CodeGraph's MCP installed; the agent issues
codegraph_explorecalls itself. Still no shared state across sessions. - C. Puppetmaster + CodeGraph — Puppetmaster swarm with CodeGraph context pre-injected into every worker prompt, structured artifacts in a durable SQLite store, stitcher reads JSON not transcripts. Follow-up queries read SQLite, not the model.
Result, modelled from real measurements on this repo (bench/three_way.py, swarm of 4 workers, artifact sizes from a real past Puppetmaster run, $3/1M token input price):
Fresh task cost (one investigation)
| Config | Tokens | Cost |
|---|---|---|
| A. Agent only | ~30,695 | ~$0.0921 |
| B. CodeGraph alone | ~6,250 | ~$0.0187 |
| C. Puppetmaster + CodeGraph | ~21,231 | ~$0.0637 |
On a single fresh task, Puppetmaster does not beat CodeGraph alone in raw tokens. Puppetmaster is doing more work — N parallel workers and a stitcher pass instead of one agent — so its token bill is higher than a single agent with CodeGraph. That's an honest, measured trade-off, and you should know it before believing any "99% reduction" copy.
Session cost (1 swarm + K follow-up reads at $3/1M)
This is where Puppetmaster actually wins. Real workflows are not one-shot: you investigate, then ask follow-up questions about the same task. In Configs A and B every follow-up is a fresh agent re-run (no persisted state). In Config C, every follow-up is just SQLite — 0 model tokens.
| Config | K=0 | K=1 | K=5 | K=10 | K=25 |
|---|---|---|---|---|---|
| A. Agent only | ~$0.0921 | ~$0.1842 | ~$0.5525 | ~$1.0129 | ~$2.3942 |
| B. CodeGraph alone | ~$0.0187 | ~$0.0375 | ~$0.1125 | ~$0.2062 | ~$0.4875 |
| C. Puppetmaster + CodeGraph | ~$0.0637 | ~$0.0637 | ~$0.0637 | ~$0.0637 | ~$0.0637 |
At K=25 follow-ups, Puppetmaster + CodeGraph is ~7.6× cheaper than CodeGraph alone and ~38× cheaper than agent-only. The crossover where C catches up to B is around K=3-4 in this dataset.
Where the savings come from
- Durable resume (Puppetmaster) — the headline. Every follow-up read against a completed swarm is a SQLite query, costing 0 model tokens. This is what flatlines the C column above.
- Typed-artifact coordination (Puppetmaster) — workers communicate through structured rows instead of raw transcripts; the stitcher reads JSON, not stdout.
- Amortized context query (CodeGraph + Puppetmaster) — one
codegraph contextcall seeds N workers in a swarm; B issues N separatecodegraph_explorecalls. - Zero tool-call frames (CodeGraph + Puppetmaster) — workers receive context inline in the initial prompt; no MCP round-trip envelope per worker.
The first two are Puppetmaster's standalone contribution and work even without CodeGraph (you'd just lose the cheap per-call context, so worker prompts get more expensive). The last two only show up when both are installed.
Reproduce on your own repo
npm install -g @colbymchenry/codegraph && codegraph init && codegraph index
# Three-way cost-structure benchmark
python -m bench.three_way --cwd . --workers 4 --artifacts-state /path/to/past/puppetmaster/state
# Just CodeGraph's prompt enrichment (A/B, no API key required)
python -m bench.codegraph_ab --cwd . --prompt @bench/prompts/example.txt --dry-run
See bench/README.md for full methodology, what's measured vs. modelled, and the honest caveats (no live token billing yet — that's on the roadmap and needs SDK-side stream instrumentation).
Cursor Integration
Puppetmaster ships with two Cursor integration surfaces.
Default subagent routing (no more "Utilize Puppetmaster..." prompts)
This repo includes .cursor/rules/puppetmaster-workflow.mdc with alwaysApply: true and a top-level AGENTS.md. Together they tell Cursor Agent (and any agent that reads AGENTS.md) to route the following work through Puppetmaster by default, without the user having to invoke it explicitly:
- broad investigation, audit, or risk analysis
- multi-file refactors, migrations, cross-cutting cleanups
- debugging that spans call graphs or test coverage
- planning when scope or risks are unclear
- comparing approaches / producing decision artifacts
Native Cursor tooling is still used directly for trivial single-file edits, follow-up questions, and anything the user explicitly framed as "just answer, no swarm."
Copy .cursor/rules/puppetmaster-workflow.mdc and AGENTS.md into any repo where you want the same default behavior.
Cursor Agent MCP
The MCP server lets Cursor Agent call Puppetmaster tools directly:
puppetmaster_doctorpuppetmaster_start_swarmpuppetmaster_start_cursor_swarmpuppetmaster_start_cursor_reviewpuppetmaster_start_cursor_planpuppetmaster_start_claude_implementpuppetmaster_statuspuppetmaster_logspuppetmaster_live_artifactspuppetmaster_live_artifacts_followpuppetmaster_partial_summarypuppetmaster_artifactspuppetmaster_showpuppetmaster_codegraph_searchpuppetmaster_codegraph_contextpuppetmaster_codegraph_affectedpuppetmaster_codegraph_filespuppetmaster_codegraph_statuspuppetmaster_codegraph_init
The older blocking tools are still available for short calls, but the daily-driver path should use puppetmaster_start_*. Start tools return a job_id immediately, so Cursor does not keep one long MCP call open while workers run.
Bundled CodeGraph tools (no second MCP)
Puppetmaster's MCP server bundles the most useful CodeGraph CLI commands so Cursor Agent only needs the Puppetmaster MCP to get both orchestration and repo intelligence:
| Tool | Wraps | Use for |
|---|---|---|
puppetmaster_codegraph_search |
codegraph query |
Find symbols by name ({query, kind?, limit?}) |
puppetmaster_codegraph_context |
codegraph context |
Pull task-relevant entry points and related symbols ({task}) |
puppetmaster_codegraph_affected |
codegraph affected |
Resolve impacted tests from changed source files ({files[]}) |
puppetmaster_codegraph_files |
codegraph files |
Inspect the indexed file structure without scanning the FS |
puppetmaster_codegraph_status |
codegraph status |
Check index health and backend |
puppetmaster_codegraph_init |
codegraph init |
Initialize CodeGraph in a workspace ({index?: true} to also build immediately) |
Every tool degrades cleanly: if the codegraph CLI is not installed or the workspace is not initialized, the response is a non-fatal isError: true payload with error set to a one-line fix-it hint, not a runtime crash.
Power users who want CodeGraph's full MCP surface (codegraph_callers, codegraph_callees, codegraph_impact, codegraph_node) — only available through its own MCP server — can still run codegraph serve --mcp alongside Puppetmaster's MCP. Bundling covers the daily-driver case so two MCP entries are no longer required by default.
For real multi-role code analysis from Cursor Agent, use puppetmaster_start_cursor_swarm. Bare custom roles on puppetmaster_start_swarm require a config or adapter; otherwise Puppetmaster fails fast instead of silently using the deterministic local demo adapter.
Workers emit artifacts as they run. You do not have to wait for the final stitched summary: use puppetmaster_live_artifacts for the live evidence board and puppetmaster_partial_summary for a current synthesis. For a push-feeling stream, use puppetmaster_live_artifacts_follow — it long-polls the durable SQLite event log and returns as soon as a new artifact lands (or after timeout_seconds), with a next_cursor Cursor Agent can chain to receive the next batch. Final stitching is the publishable report built from the same artifact stream.
CLI users can do the same with python -m puppetmaster feed <job_id> --follow, which streams new artifacts as they arrive without re-reading already-seen events.
Blocking tools:
puppetmaster_cursor_reviewpuppetmaster_cursor_planpuppetmaster_claude_implementpuppetmaster_last_job
One-line setup (recommended, v0.7.2+):
python -m puppetmaster install-cursor-mcp # workspace .cursor/mcp.json
python -m puppetmaster install-cursor-mcp --global # ~/.cursor/mcp.json (every workspace)
The installer (a) resolves the exact Python that has puppetmaster importable via sys.executable (avoids "the python on Cursor's PATH is the wrong one" bugs), (b) launches the MCP server in a subprocess and verifies it responds to tools/list before writing anything, (c) merges into the existing mcp.json without touching unrelated servers or wiping any env block you already have set (API keys are preserved). It is fully idempotent — re-running reports unchanged.
What the installer writes (equivalent manual config, if you prefer to edit by hand):
{
"mcpServers": {
"puppetmaster": {
"command": "/absolute/path/to/python",
"args": ["-m", "puppetmaster.mcp_server"]
}
}
}
MCP does not patch Cursor's private model picker or force Cursor's native subagents to change their resource model. It gives Cursor Agent a tool surface that invokes Puppetmaster. Once invoked, Puppetmaster owns the run: independent worker processes, SQLite coordination, structured artifacts, and a stitched result returned to Cursor.
See Cursor Agent MCP.
Codex CLI / Codex IDE MCP
Codex (the official OpenAI Codex CLI / Codex desktop app, npm install -g @openai/codex) also speaks MCP. The wire protocol is identical to Cursor's, but the config file and registration command are different. Codex stores MCP servers in ~/.codex/config.toml under [mcp_servers.<name>].
One-line setup (recommended, v0.7.2+):
python -m puppetmaster install-codex-mcp
codex mcp list # confirm: puppetmaster ... enabled
The installer shells out to codex mcp add with the resolved sys.executable so Codex always launches the right Python, runs a tools/list handshake before registering so a broken setup is caught immediately, and is fully idempotent (re-running reports unchanged; pass --force to repoint at a new Python). It also prints the sandbox-caveat guidance below as part of its "next steps" output.
Equivalent manual command if you prefer:
codex mcp add puppetmaster -- $(python -c 'import sys; print(sys.executable)') -m puppetmaster.mcp_server
That's it — every new Codex session sees the Puppetmaster MCP tools. No restart required for fresh sessions; existing TUI sessions need to be restarted for MCP changes to take effect.
Sandbox caveat (important and not obvious): Codex sandboxes MCP-server subprocesses inside the agent's sandbox. Puppetmaster's MCP server reads / writes ~/.puppetmaster/ (its durable state dir) which sits outside any workspace. Under Codex's default --sandbox workspace-write, that access is denied and the tool call fails with mcp: puppetmaster/* (failed) followed by user cancelled MCP tool call. Two clean ways out:
| Mode | Command | When to use |
|---|---|---|
| Interactive TUI | codex (no flags) |
First-time approval prompt for ~/.puppetmaster/; subsequent calls in the same session pass. Best for daily-driver use. |
| Non-interactive automation | codex exec --dangerously-bypass-approvals-and-sandbox ... |
Required for codex exec scripts that need to invoke Puppetmaster MCP tools. Functionally equivalent to running Puppetmaster's CLI directly. |
Verified end-to-end (May 2026, against Codex 0.134.0 + Puppetmaster v0.7.1): asking Codex "Call the puppetmaster_doctor MCP tool. Report exactly how many checks returned status='ok'." returned mcp: puppetmaster/puppetmaster_doctor (completed) followed by Codex's correct summary of the actual doctor output.
If you have BOTH Cursor and Codex installed and want to drive Puppetmaster from either, that's fine — Cursor and Codex use separate config files and don't interfere with each other.
Cursor Extension
The extension adds a Puppetmaster activity-bar control panel:
- configure provider keys in Cursor secret storage
- run
doctor - launch Cursor review/plan dry runs
- launch Claude Code full-edit jobs
- inspect latest job, logs, and artifacts
Download the VSIX from the latest GitHub release or build it locally:
cd cursor-extension
npm run check
npx -y @vscode/vsce package --no-dependencies
Then run Extensions: Install from VSIX... in Cursor and choose the generated .vsix.
See Cursor Extension.
Live Adapters
Cursor
Use Cursor for review, planning, and dry-run implementation workflows.
export CURSOR_API_KEY="<your-cursor-api-key>"
python -m puppetmaster cursor "Review this repo and propose the next patch" --review --dry-run
python -m puppetmaster cursor "Plan the next implementation slice" --plan --dry-run
The Cursor adapter runs isolated one-shot agents through @cursor/sdk.
Claude Code
Use Claude Code when you want a real coding agent to edit a clean repo or worktree.
export ANTHROPIC_API_KEY="<your-anthropic-api-key>"
export CLAUDE_CODE_COMMAND="npx -y @anthropic-ai/claude-code"
python -m puppetmaster claude \
"Implement the approved change and run focused tests" \
--permission-mode acceptEdits
Claude Code full-edit runs require a clean working tree by default. Puppetmaster blocks dirty repos unless you explicitly pass --allow-dirty, because otherwise patch artifacts could mix old local changes with agent changes.
When Claude Code edits tracked files, Puppetmaster records:
- a
verificationartifact with stdout/stderr, return code, model usage, and failure classification - a
patchartifact with changed files, base SHA, unified diff, and revert guidance
OpenAI
Use the openai adapter when you want to bypass Cursor's SDK entirely and hit OpenAI's Chat Completions API directly. This is the adapter that returns real billing-grade token counts (pulled from usage.prompt_tokens / usage.completion_tokens on the API response), which is why bench/router_live_ab.py runs against it.
export OPENAI_API_KEY="<your-openai-api-key>"
Every Cursor/Claude/OpenAI worker emits the same structured artifact contract: a VERIFICATION artifact with tokens_in, tokens_out, tokens_total, plus parsed FINDING / RISK / DECISION artifacts from the JSON envelope. The OpenAI adapter additionally exposes those token counts on every run so puppetmaster cost <job_id> can sum real spend, not estimates.
The starter registry includes four OpenAI tiers — openai/gpt-5-5 ($5 / $30 per MTok, frontier), openai/gpt-5-4 ($2.50 / $15, workhorse), openai/gpt-5-4-mini ($0.75 / $4.50, balanced), openai/gpt-5-4-nano ($0.15 / $0.90, cheap reader) — so the router can pick from the full GPT-5 family without manual configuration.
Codex
Use the codex adapter when you want OpenAI's models inside an actual coding-agent loop — the OpenAI-side equivalent of Claude Code. The codex CLI ships with file editing, shell, search, and tool-use built in; this adapter shells out to codex exec --json and captures the structured event stream, so you get real input_tokens / output_tokens / cached_input_tokens / reasoning_output_tokens / thread_id per turn — strictly richer telemetry than any other adapter ships.
npm install -g @openai/codex
printenv OPENAI_API_KEY | codex login --with-api-key
python -m puppetmaster codex "Implement the approved change and run focused tests"
The Codex adapter, like Claude Code, blocks on a dirty worktree by default (so resulting patch artifacts are attributable to the agent's run, not pre-existing churn). Pass payload.allow_dirty=true or downgrade to payload.sandbox="read-only" for review-only roles that never touch the worktree. The adapter ships two starter registry tiers — codex/gpt-5-5 ($5 / $30, frontier with agent loop) and codex/gpt-5-4-mini ($0.75 / $4.50, balanced with agent loop) — so the router can fan tasks across cursor (subscription-billed), claude-code (per-token Anthropic), openai (per-token OpenAI chat), and codex (per-token OpenAI agent-loop) inside a single swarm.
For one-shot reasoning where no tool use is needed, the router will usually prefer openai/* over codex/* at the same capability tier because Chat Completions has no agent-loop round-trip overhead. For multi-file refactors and audits, codex/* is the natural choice — its capability_score is set 1 point higher than the matching openai/* tier specifically to encode the agent-loop advantage.
Shell
Use shell for bounded verification steps:
{
"role": "verify-runtime",
"instruction": "Verify Python is available.",
"adapter": "shell",
"payload": {
"command": ["python", "--version"],
"timeout_seconds": 10
}
}
CLI Reference
python -m puppetmaster doctor
python -m puppetmaster adapters
python -m puppetmaster init-config --path puppetmaster.json
python -m puppetmaster run "Goal" --config examples/enterprise-workflow.json
python -m puppetmaster daemon --roles explore architect implement redteam test
python -m puppetmaster cursor "Goal" --review --dry-run
python -m puppetmaster claude "Goal" --permission-mode acceptEdits
python -m puppetmaster crash-demo
python -m puppetmaster status <job_id>
python -m puppetmaster watch <job_id>
python -m puppetmaster events <job_id>
python -m puppetmaster feed [job_id]
python -m puppetmaster artifacts <job_id>
python -m puppetmaster logs [job_id]
python -m puppetmaster open [job_id]
python -m puppetmaster last
python -m puppetmaster rerun [job_id]
python -m puppetmaster diff [job_id]
python -m puppetmaster approve <job_id-or-artifact-id>
python -m puppetmaster reject <job_id-or-artifact-id> --reason "why"
python -m puppetmaster clean --completed
python -m puppetmaster memory
Workflow Config
{
"lease_seconds": 10,
"workers": [
{
"role": "explore",
"instruction": "Map the goal and emit evidenced findings."
},
{
"role": "claude-implement",
"instruction": "Use Claude Code to implement the requested change.",
"adapter": "claude-code",
"depends_on": ["explore"],
"payload": {
"prompt": "Implement the change and run focused tests.",
"cwd": ".",
"permission_mode": "acceptEdits",
"allowed_tools": ["Read", "Edit", "MultiEdit", "Write", "Bash"],
"timeout_seconds": 900,
"allow_dirty": false
}
}
]
}
Examples:
- Enterprise Workflow
- Cursor Live
- Cursor Review
- Cursor Dry-Run Implementation
- Claude Code Full Edit
- Memory Reuse
State Model
By default, Puppetmaster keeps runtime state outside the repository so git status stays focused on source changes:
macOS: ~/Library/Application Support/puppetmaster/projects/<workspace>-<hash>/
Linux: ~/.local/state/puppetmaster/projects/<workspace>-<hash>/
Print the resolved location:
python -m puppetmaster state
Override it when you intentionally want repo-local or CI-specific state:
python -m puppetmaster --state-dir .puppetmaster run "Map this repo"
PUPPETMASTER_STATE_DIR=.puppetmaster python -m puppetmaster doctor
The state directory contains:
<state-dir>/
state.sqlite3
jobs/
memory/
streams/
locks/
.puppetmaster/ remains in .gitignore as a compatibility fallback for explicit local state.
Core objects:
Job: one swarm run and user goalTask: role-specific work, optionally dependency-gatedAgentRun: one worker attemptArtifact: structured output with payload, evidence, confidence, andsha256MemoryRecord: promoted fact retrieved by later workers
Safety Model
Puppetmaster is powerful because it can orchestrate tools that edit code. The safety model is explicit:
- Cursor defaults toward review/plan/dry-run workflows.
- Claude Code is full-edit, but blocked on dirty worktrees by default.
- Patch outputs are artifacts with diffs and base SHAs.
- Approval/rejection is recorded in the event stream.
- Stale workers are recovered through leases.
- Failed provider calls become structured artifacts instead of mystery crashes.
- Secrets stay in environment variables, never config files.
If you paste a key into a terminal, chat, issue, screenshot, or transcript, rotate it before publishing.
Troubleshooting
Tool execution error. Not connected from Cursor
This is Cursor's MCP client telling you it lost the stdio transport to the Puppetmaster MCP server — not that your swarm or jobs died. Common triggers:
- Heavy concurrent load (parallel Cursor SDK swarm + CodeGraph index + large status payloads in the same window).
- Cursor reloading MCP settings, toggling the server, or restarting Cursor itself.
- An in-flight tool call exceeding Cursor's internal timeout.
Prevention layer (v0.5.3+): every long-running tool call now emits
JSON-RPC notifications/message keepalive frames every 10 seconds after
a 5-second grace period. Bytes flowing on the stdio pipe defeat the
"transport looks dead" heuristic in Cursor's MCP client. Short calls
pay zero protocol cost. Tune or disable with:
PUPPETMASTER_MCP_KEEPALIVE_AFTER_SECONDS(default 5)PUPPETMASTER_MCP_KEEPALIVE_INTERVAL_SECONDS(default 10)PUPPETMASTER_MCP_KEEPALIVE_DISABLED=1(turn off entirely)
Root-cause fix (v0.5.6+): Pre-v0.5.6, parallel puppetmaster_doctor calls (or any other tool that fanned out to multiple subprocess.run invocations) could silently kill the MCP server with exit_code=0 because subprocess children inherited the parent's stdin by default. Concurrent spawn pressure somehow caused the parent's for line in sys.stdin loop to receive a phantom EOF and exit cleanly — looking from Cursor's side exactly like Tool execution error. Not connected. Every subprocess call in the server's code path now passes stdin=subprocess.DEVNULL, severing the inheritance chain. Verified by bench/mcp_stress.py (run it any time: 6 scenarios in ~90s).
Self-healing layer (v0.5.4+): Cursor's MCP client uses a "lease"
lifecycle that periodically re-creates the logical client without
killing the previous Python MCP server. Without the keepalive above,
that left one orphan server per lease cycle holding open SQLite handles
and competing for the CodeGraph indexer lock. The new
_InputStalenessWatcher measures inbound JSON-RPC traffic
directly: if no stdin message has arrived in 10 minutes and there
are zero in-flight tool calls, the server closes stdin and exits
through the normal finally block (deregister, stop heartbeat, shut
down executor). Active sessions are never interrupted; only true
orphans reap. Tune or disable:
PUPPETMASTER_MCP_INPUT_STALE_SECONDS(default 600)PUPPETMASTER_MCP_INPUT_STALE_CHECK_SECONDS(default 30)PUPPETMASTER_MCP_INPUT_STALE_DISABLED=1
Idle-pipe keepalive (v0.5.5+): Some Cursor builds close MCP
transports that have been quiet for a while, even between successful
calls. The new _IdleKeepalive thread emits a tiny
notifications/message every ~25s while no tool call is running, so
the stdio pipe is never silent long enough to look dead. Cost is
trivial (~22 KB/hour). The per-call keepalive (v0.5.3) and idle
keepalive together cover both "tool in flight" and "tool not in
flight" cases. Tune or disable:
PUPPETMASTER_MCP_IDLE_KEEPALIVE_INTERVAL_SECONDS(default 25, min 5)PUPPETMASTER_MCP_IDLE_KEEPALIVE_DISABLED=1
Agent-side CLI fallback (v0.5.5+): When the transport drops anyway
(e.g., during the lease transition itself), the bundled Cursor rule
(.cursor/rules/puppetmaster-workflow.mdc) and AGENTS.md instruct
the AI agent to call the equivalent python -m puppetmaster ...
command via its shell tool instead of giving up. Every MCP tool has a
matching CLI; read-only commands (show/artifacts/logs/feed/status)
auto-pivot to the project state dir that owns the job, so no manual
PUPPETMASTER_STATE_DIR export is needed.
CodeGraph indexes for different repos now run concurrently
Pre-v0.5.5, Puppetmaster used a single machine-wide lock to serialize
all CodeGraph indexers, so running puppetmaster_codegraph_index
against ff-data-engineering would block the same call for ff-ios
with Another CodeGraph indexer is already running (pid 80417) — even
though the two repos have separate SQLite databases that can't trash
each other.
v0.5.5 keys the lock on the resolved repo root path
(codegraph-indexer-<repo>-<digest>.lock). Different repos index in
parallel; the lock only fires when two indexers are actually pointed
at the same repo's DB. Stale-PID auto-clear handles the post-kill -9
case: if the recorded PID isn't alive, the new claimant takes over
instead of refusing forever. Manual rm /Users/.../codegraph-indexer*.lock
is no longer needed after a runaway indexer dies.
If the transport still drops, the recovery layer below catches the fallout.
When this happens, in-flight Puppetmaster swarms keep running in the
background (that's the whole point of durable state — see python -m puppetmaster jobs from a shell to confirm), but you typically end up
with one or more orphan python -m puppetmaster.mcp_server processes
holding open SQLite handles and contending for the CodeGraph indexer
lock.
Diagnose:
python -m puppetmaster mcp list
# 3 tracked (1 alive, 0 stale, 2 dead)
# PID STATE AGE HBEAT WORKSPACE
# 12345 ok 12s 8s /Users/you/repo
# 11111 dead 4231s 4231s /Users/you/repo
# 11112 dead 4231s 4231s /Users/you/repo
puppetmaster doctor also flags this automatically.
Clean up:
python -m puppetmaster mcp cleanup --kill-stale
Then restart the Puppetmaster MCP server in Cursor
(Settings → MCP → toggle off/on). Inside an agent session you can call
puppetmaster_mcp_status / puppetmaster_mcp_cleanup directly — handy
for letting the agent self-diagnose right after a reconnect.
Each running Puppetmaster MCP server now registers itself in
~/Library/Caches/puppetmaster/mcp-servers/<pid>.json (or
$XDG_CACHE_HOME/puppetmaster/mcp-servers/ on Linux) and updates a
heartbeat from a background thread, so dead and stale entries are
detectable without grepping ps.
CodeGraph reports database is locked from MCP, but works fine in the terminal
This is the most common gotcha on macOS Cursor installs. CodeGraph's native
SQLite driver (better-sqlite3) is locked to a specific Node ABI. You have
two Node runtimes that touch the same global CodeGraph install:
| Runtime | Typical Node | NODE_MODULE_VERSION |
|---|---|---|
Your shell (/opt/homebrew/bin/node) |
v23.x | 131 |
Cursor's bundled Node (Cursor.app/.../helpers/node) |
v22.22.0 | 127 |
If you ran npm rebuild better-sqlite3 in your shell, it built for the
shell's Node, which means Puppetmaster's MCP (running under Cursor's
Node) silently falls back to the slow WASM driver and you'll see
database is locked / unable to open database file. puppetmaster doctor
will flag this as native better-sqlite3 broken; codegraph is on slow WASM fallback.
One-command fix:
python -m puppetmaster repair-codegraph
It auto-detects Cursor's bundled Node, locates the global CodeGraph install,
runs npm rebuild better-sqlite3 with Cursor's Node on PATH, and verifies
the backend reports as native. Then restart the Puppetmaster MCP server in
Cursor (Settings → MCP → toggle off/on).
You can also call it from inside the agent itself via the
puppetmaster_repair_codegraph MCP tool — useful if an agent hits the WASM
fallback mid-session and can self-heal.
Tradeoff: better-sqlite3 is ABI-specific. Rebuilding for Cursor's
Node 22 may break native SQLite in your terminal (Node 23) until you
rebuild again with the shell's Node. For day-to-day Cursor use, optimize
for Cursor's Node. If you upgrade Cursor and the bundled Node ABI changes,
re-run puppetmaster repair-codegraph.
v0.5.4 makes this self-correcting at runtime. Puppetmaster now
invokes codegraph by explicitly running its codegraph.js entrypoint
under Cursor's bundled Node whenever both are discoverable (via the
new resolve_codegraph_invocation() helper), regardless of which Node
sits first on $PATH. That eliminates the failure mode where a stray
shell shim under Homebrew Node spins up an indexer in WASM mode and
locks the DB for hours. The corresponding puppetmaster doctor
codegraph check now also verifies against the runtime Puppetmaster
actually uses — not whichever shim happens to be on PATH — so you get
ok (verified under Cursor's bundled Node) instead of a misleading
warn when MCP is healthy.
Escape hatches for weird installs:
PUPPETMASTER_CODEGRAPH_NODE— full path to the Node binary to use.PUPPETMASTER_CODEGRAPH_JS— full path tocodegraph.js.
Both must be set together; auto-detection runs otherwise.
puppetmaster adapters says cursor: configured=false, but my swarms work
You're probably running it from a workspace where you don't have @cursor/sdk
installed locally. The Puppetmaster MCP loads the SDK from the package
install dir's node_modules, not from your cwd — so the swarm worked fine
while diagnostics lied. v0.5.4 fixes the detection: _cursor_sdk_installed
now checks both the workspace and the package install dir, and reports the
location it found:
ok cursor-sdk @cursor/sdk installed (/Users/.../Puppetmaster/node_modules/@cursor/sdk)
PUPPETMASTER_HOME is an explicit escape hatch if your install lives somewhere unusual.
puppetmaster show <job_id> fails from any cwd other than the workspace that ran the job
Pre-v0.5.4, each workspace had its own per-project SQLite state dir hashed
from the resolved git root. If you ran a swarm in /Users/you/ff-ios and
later tried puppetmaster show job_X from /tmp (or any other repo),
it would fail with job not found even though the job was alive — and the
workaround was exporting PUPPETMASTER_STATE_DIR to the right hashed
path, which you'd have to look up.
v0.5.4 auto-pivots. Read-only commands (show, artifacts, diff,
feed, logs, events, status, memory, open) scan every project
state dir on the machine and use the one that owns the job, with a
single note: on stderr telling you which it picked:
$ cd /tmp && python -m puppetmaster show job_4fc8c7148d65
note: job job_4fc8c7148d65 not in current workspace state dir; using /Users/.../projects/Puppetmaster-7b41939e66e6
# Puppetmaster Stitched Summary
...
Two new commands round it out:
python -m puppetmaster projects— lists every project state dir on this machine with job counts and last activity.python -m puppetmaster jobs --all-projects— flattens jobs from every project into one stream with a project column.
Write-side commands (run, cursor, claude, daemon, ...) intentionally do
not pivot. Those always use the caller's workspace state. Explicit
--state-dir or $PUPPETMASTER_STATE_DIR overrides also disable the
pivot.
Documentation
- Architecture
- Adapters
- Cursor Agent MCP
- Cursor Extension
- Daily Driver
- Production Notes
- Security
- Contributing
- Release Checklist
- Changelog
- Roadmap
Status
Puppetmaster is daily-driver beta software. The runtime contract is real, tests are automated, SQLite is the default backend, jobs fail closed, Cursor Agent MCP is live, the Cursor extension is installable, and Claude Code has been validated as a full-edit adapter that emits patch artifacts.
It is credible for supervised local engineering workflows. It is not yet a hosted multi-user production service.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file puppetmaster_ai-0.7.2.tar.gz.
File metadata
- Download URL: puppetmaster_ai-0.7.2.tar.gz
- Upload date:
- Size: 203.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b863ad5982826493c1ed6411f59dbf2e9e8f5e6397afdacbdf0194bc4775c2d
|
|
| MD5 |
467564f19ed82559c385eda9b15d7ef4
|
|
| BLAKE2b-256 |
2a0ff5c068b6d084392f2244748d40a9fb6284b1570041a1e86f93ca64aaa665
|
File details
Details for the file puppetmaster_ai-0.7.2-py3-none-any.whl.
File metadata
- Download URL: puppetmaster_ai-0.7.2-py3-none-any.whl
- Upload date:
- Size: 135.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a59bff497c7f8b0f42ddcefd4b233ad607562deddc2f2ce2357a2c73dc7ebd2
|
|
| MD5 |
dd184f8b84311aac27c767bff37aa8b0
|
|
| BLAKE2b-256 |
57b56fa79caf717534d74547bc2a4f1ac74169f2a81429caf21592e972936546
|