Skip to main content

Gunicorn-style agent swarms with Redis-like shared memory and artifact stitching. (Imported as `puppetmaster`; published as `puppetmaster-ai` because the bare PyPI name is held by an abandoned 2019 project, name-reassignment pending.)

Project description

Puppetmaster

PyPI CI License: MIT Python

Puppetmaster turns Cursor (or Claude Code, or the OpenAI API, or the OpenAI Codex CLI) into an orchestrator that routes each task to the cheapest model that can handle it, stores worker outputs as typed SQLite artifacts so follow-ups cost zero tokens, and coordinates workers through durable state instead of a shared parent transcript.

Live OpenAI A/B with real billing tokens — same prompt, equivalent answer, one of 3 back-to-back runs: Pinned gpt-5.5: $0.006900 in 5.480 s — Puppetmaster routed to gpt-5.4-nano: $0.000132 in 1.511 s98.1% cheaper, 72.4% faster. Reproduce: OPENAI_API_KEY=... python -m bench.router_live_ab.

Install

pip install puppetmaster-ai
puppetmaster setup        # one-shot: doctor + models init + install-cursor-mcp + install-codex-mcp + install-rules

That's the whole install. setup runs every step idempotently, skips any tool that isn't present, and prints what it did at the end. Restart Cursor (or open a fresh Codex / Claude session) and the agent will see 32+ puppetmaster_* tools — plus an agent rule nudging it to reach for them on multi-file work.

Pip name note: PyPI lists this as puppetmaster-ai because PEP-503 name normalization treats puppetmaster and puppet-master as the same name, and puppet-master is held by an abandoned 2019 project. The import name, CLI binary, GitHub repo, and brand all stay puppetmaster. Name-reassignment request in flight (tracking doc).

To run benchmarks or contribute, clone the repo instead:

git clone https://github.com/professorpalmer/Puppetmaster.git
cd Puppetmaster && python -m pip install -e . && npm install --package-lock=false --no-audit
OPENAI_API_KEY=... python -m bench.router_live_ab   # ~$0.01 of real spend, prints the ~98%-cheaper receipt

bench/ and the Cursor extension source ship only with the cloned repo, not the pip wheel.

What it does

Think Redis/Gunicorn for agentic engineering:

Cursor Agent / Claude Code / OpenAI / Codex CLI / shell
        |
        v
Puppetmaster supervisor  ──>  task-aware model router (12 starter tiers)
        |
        v
independent worker processes  ──>  SQLite (typed artifacts, events, memory)
        |
        v
live artifact board  ──>  stitched summary  ──>  0-token follow-up reads

Puppetmaster is not trying to beat native IDE subagents at every tiny task. It is for the work that gets messy: long repo investigations, conflicting hypotheses, repeated handoffs, flaky memory, and code changes that need evidence, replay, and approval gates. The design rationale and the failure modes it fixes are in docs/WHY.md.

Three claims, three receipts

Every number in this section comes from a reproducible script in bench/. What is not defensible (and what we won't claim) lives in TALKING_POINTS.md.

1. Token cost — fixed on two axes

On new work — the v0.6.0+ router classifies each task's complexity (role + instruction signal patterns + payload size) and picks the cheapest model from your user-owned registry that can handle it. Every routing decision is an auditable ROUTING artifact: picked model, capability needed, estimated cost, and the full list of rejected alternatives with the reason each was rejected.

  • bench/router_savings.py — on a 6-task fixture, 35.1% cheaper than pinning a frontier model. The wins come from not using a frontier model when the task doesn't need one.
  • bench/router_live_ab.py — live OpenAI A/B with real usage.prompt_tokens: 98.1–98.7% cheaper across 3 consecutive runs on a single explore task; wall-time savings between 68% and 88% per run.

On follow-up work — once a swarm completes, every artifact lives in SQLite. Follow-up questions are SQLite queries, not new agent runs.

  • bench/followup_cost.py40 follow-up queries against a real completed swarm: 0 adapter calls, 0 tokens, $0.00, avg 0.5 ms per query. Hypothetical "always-frontier replay" baseline for the same 40 queries: $1.64 at Anthropic's current Opus 4.7 rate ($5/$25 per MTok).

Honest scope: this is the follow-up reads are free claim. If your follow-up needs new reasoning the swarm didn't produce, that's a new task and it costs tokens like any other.

2. Transcript — workers don't share one

The classic multi-subagent shape stuffs everything into one parent chat. Each subagent inherits stale context, results come back as prose, and the context window bloats until the important details are buried.

Puppetmaster does the opposite. Workers don't see each other's transcripts. They claim tasks by lease, emit typed artifacts with payloads + evidence + confidence + sha256 integrity, and the final stitcher reads JSON — not raw worker stdout. The parent agent's context only sees what the stitcher publishes.

  • Inspect a live swarm: puppetmaster artifacts <job_id> — the actual coordination surface, not a chat scrollback.
  • Inspect a completed swarm without paying tokens: same command, milliseconds, $0 (receipt #1 above).
  • Verify nothing is hand-waved: every artifact carries created_by, created_at, and a content sha256.

3. Graphing — credit CodeGraph, wire it in cleanly

The "graph your directories" capability is not a Puppetmaster feature. It's CodeGraph — a separate project — and it deserves the credit. Puppetmaster's contribution is what happens after CodeGraph is installed: every worker auto-injects task-relevant CodeGraph context into its prompt before the model call, one shared codegraph context query seeds N parallel workers, and the resulting artifacts land in the same durable store. Puppetmaster works fine without CodeGraph; workers fall back to grep/read. Details in docs/CODEGRAPH.md.

Quickstart

After pip install puppetmaster-ai && puppetmaster setup, try one of these inside Cursor Agent or Codex:

Use Puppetmaster to run doctor in this repo and summarize what is missing.
Use Puppetmaster to start a cursor swarm for this repo and return the job id immediately.
Problem: users get logged out after refresh and token-refresh tests are flaky.
Constraints: keep the patch focused, preserve public API behavior, run relevant tests.
Do review/plan first. Poll status/logs by job id. Do not edit until you summarize findings and ask for approval.

Or from the shell:

puppetmaster doctor
puppetmaster route "Security audit every endpoint" --role audit   # dry-run routing decision
puppetmaster cursor "Review this repo for release blockers" --review --dry-run
puppetmaster claude "Implement the approved change and run focused tests" --permission-mode acceptEdits
puppetmaster show $(puppetmaster last)

More daily-driver prompts in docs/DAILY_DRIVER.md.

Adapters

Four production adapters live; eleven tiers in the starter registry (5 Cursor/Claude + 4 OpenAI + 2 Codex). Tier and pricing details in docs/MODEL_ROUTING.md; adapter wiring details in docs/ADAPTERS.md.

Adapter What it's for Telemetry Setup
cursor Review / plan / dry-run via @cursor/sdk tokens reported by SDK CURSOR_API_KEY
claude-code Full-edit workflows via the claude CLI usage from CLI npm i -g @anthropic-ai/claude-code + ANTHROPIC_API_KEY
openai Direct Chat Completions (the most pricing-transparent path) real usage.prompt_tokens/completion_tokens OPENAI_API_KEY
codex Full-edit via the OpenAI Codex CLI agent loop input_tokens + output_tokens + cached_input_tokens + reasoning_output_tokens per turn npm i -g @openai/codex + codex login
shell Bounded verification commands n/a none

What works today

Area Status
Local runtime Daily-driver beta: subprocess workers, task DAGs, leases, recovery, failure states
SQLite backend Default, WAL mode, schema metadata, integrity checks, persisted events
Model router (v0.6.0+) Task-aware routing; auditable ROUTING artifacts. Receipts: bench/
One-line MCP installers (v0.7.2+) install-cursor-mcp, install-codex-mcp — resolve sys.executable, handshake before write, idempotent
One-line rule installer (v0.7.3+) install-rules — Cursor .mdc + cross-tool AGENTS.md + global Codex/Claude rules, merge-don't-overwrite
puppetmaster setup (v0.7.3+) One-shot wizard chaining doctor → models init → MCP installers → rules
Cursor Agent MCP Async start tools, status polling, logs, live artifacts, partial summaries, routing tools
Cursor extension Activity-bar control panel (docs)
Memory Promoted memory retrieval into later worker context and prompts
CodeGraph Optional shared repo intelligence (docs)
Patch workflow Patch artifacts, path locks, approval/rejection events, dirty-worktree guard
Reproducible benchmarks Six harnesses in bench/, each with markdown + JSON receipts under bench/results/

Documentation

Doc What's in it
docs/WHY.md Design rationale: what shared-transcript subagents get wrong, what durable state fixes
docs/ARCHITECTURE.md Job / Task / Worker / Artifact / Stitcher / Memory object model
docs/MODEL_ROUTING.md Router policies, classifier, registry schema, the 12 starter tiers
docs/CODEGRAPH.md CodeGraph integration, bundled MCP tools, cost comparison
docs/ADAPTERS.md All four production adapters + shell + how to add a new one
docs/CLI_REFERENCE.md Every CLI subcommand, workflow config schema, daemon mode
docs/DAILY_DRIVER.md Prompt recipes for review, swarm, implement, post-job inspection
docs/CURSOR_AGENT_MCP.md The MCP tool surface (32 tools) in detail
docs/CURSOR_EXTENSION.md Activity-bar control panel install + features
docs/TROUBLESHOOTING.md Tool execution error. Not connected, CodeGraph SQLite ABI, state-dir auto-pivot, safety model
docs/PRODUCTION.md Operating notes for non-toy use
docs/SECURITY.md Secret handling + reporting
docs/CONTRIBUTING.md How to land a patch
docs/CHANGELOG.md Versioned changes
docs/ROADMAP.md What's next
docs/PYPI_NAME_REQUEST.md The bare-puppetmaster name reassignment effort
TALKING_POINTS.md Truth-table separating "use this phrasing" from "avoid that overclaim"

Status

Puppetmaster is daily-driver beta software. The runtime contract is real, tests are automated, SQLite is the default backend, jobs fail closed, Cursor Agent MCP is live, the Cursor extension is installable, and Claude Code + Codex have both been validated as full-edit adapters that emit patch artifacts. It is credible for supervised local engineering workflows. It is not yet a hosted multi-user production service.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

puppetmaster_ai-0.8.0.tar.gz (179.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

puppetmaster_ai-0.8.0-py3-none-any.whl (144.8 kB view details)

Uploaded Python 3

File details

Details for the file puppetmaster_ai-0.8.0.tar.gz.

File metadata

  • Download URL: puppetmaster_ai-0.8.0.tar.gz
  • Upload date:
  • Size: 179.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.7

File hashes

Hashes for puppetmaster_ai-0.8.0.tar.gz
Algorithm Hash digest
SHA256 63c62839477b2b4b54c784127996b07b09b032d20c7886aaa868ee098b2b859c
MD5 007cdc4ba2fd7e8c58006ef0ee035aa5
BLAKE2b-256 927be7fbd1768a5e4283c6434ed04c7800236c7416141b687660f69a3665168d

See more details on using hashes here.

File details

Details for the file puppetmaster_ai-0.8.0-py3-none-any.whl.

File metadata

File hashes

Hashes for puppetmaster_ai-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fe82b4be9bd9daf16742810b973b4a87e25de150a4083883a6f661df0db55022
MD5 9e04f992dd53f0e233dd4e09605335e7
BLAKE2b-256 401c23608d82af65c4cf8d3208b328cd2b40910ff1706d6b9ac4037233e45597

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page