Skip to main content

Board-mediated multi-agent coordination in ~500 lines of Python

Project description

marcus-mini

Board-mediated multi-agent coordination in ~500 lines of Python.

Multiple AI agents work in parallel on a shared task board.
They never talk to each other — they coordinate exclusively through SQLite.

PyPI Python CI License

Quickstart · Why a board? · Commands · How it works


What it looks like

marcus-mini demo

$ mini build "a snake game in Python"

  Decomposing goal  8 tasks
  Spawning 3 agents (DAG recommends 3) in tmux session 'marcus-snake-game-1146'
    Spawned agent-1 (pane 0)
    Spawned agent-2 (pane 1)
    Spawned agent-3 (pane 2)

   3 agents running. mini watch to follow along.

mini dag shows the decomposed work as it's running:

  ╭────────────────────╮
  │ project structure  │
  ╰────────────────────╯
            │
  ┌─────────┴─────────────┐
  ▼                       ▼
╭────────────────────╮  ╭────────────────────╮
│  game state model  │  │   render engine    │
╰────────────────────╯  ╰────────────────────╯
            │                       │
            └────────┬──────────────┘
                     ▼
          ╭────────────────────╮
          │   collision logic  │
          ╰────────────────────╯
                     │
                     ▼
          ╭────────────────────╮
          │     game loop      │
          ╰────────────────────╯

mini bench measures coordination overhead after the run completes:

Bench — snake-game-1146
─────────────────────────────────────
  Wall time         8m 34s
  Agent work        18m 12s
  Utilization       70.8%
  Coordination tax  29.2%
  Tasks             8 (8 done)

The idea

Most multi-agent frameworks let agents talk to each other directly. marcus-mini does the opposite: agents are blind to each other and coordinate only through a shared board.

Three invariants hold in every run:

  1. Agents self-select work. The board assigns nothing — agents pull the next available task whose dependencies are all complete.
  2. Agents make all implementation decisions. The board says what to build, never how.
  3. Agents communicate only through the board. Artifacts and decisions logged to a task become context for downstream agents. No direct messages, no shared memory outside the board.

This mirrors how distributed teams actually work: a shared backlog, async handoffs, no mandatory standups.


Why a board?

Board mediation solves a specific coordination problem: how do you get multiple agents to work in parallel without them stepping on each other or needing to constantly sync?

The alternatives all have problems. If agents talk to each other directly, you get an N² communication explosion — every agent has to know about every other agent, and conversations multiply as you scale. If they share memory, you get race conditions and need locks. If a central orchestrator hands out work, that orchestrator becomes the bottleneck and single point of failure, plus it has to be smart enough to know what each agent is capable of.

Direct agent-to-agent              Board-mediated
(N² edges)                         (N edges)

   A1 ─────── A2                       A1
   │ ╲       ╱ │                       │
   │   ╲   ╱   │                       ▼
   │     ╳     │                  ┌─────────┐
   │   ╱   ╲   │            A2 ──▶│  board  │◀── A4
   │ ╱       ╲ │                  └─────────┘
   A4 ─────── A3                       ▲
                                       │
                                       A3

  N=4  →   6 edges                N=4  →  4 edges
  N=10 →  45 edges                N=10 → 10 edges
  N=30 → 435 edges                N=30 → 30 edges

A board flips the model: the environment holds the state, and agents are stateless workers that pull from it. This gives you a few things for free:

  • Atomic work claiming. A SQL transaction (BEGIN EXCLUSIVE + dependency check) guarantees two agents can't grab the same task. No lock manager, no consensus protocol.
  • Async handoffs. Agent A finishes a task and logs an artifact. Agent B, hours later, picks up a dependent task and reads that artifact as context. Neither needs to be online at the same time.
  • Agent blindness as a feature. Because agents don't know about each other, you can add, remove, or crash them freely. The board doesn't care. This is why mini can spawn 3 or 30 agents with the same code.
  • Auditability. Every decision and artifact is persisted. You can replay what happened, measure coordination tax, debug why an agent went off the rails.

It's the same pattern distributed teams already use — a shared backlog instead of everyone DM'ing each other. The constraints are similar: unreliable participants, variable work duration, no guarantee anyone's online right now.

Real-world performance

We benchmarked marcus-mini head-to-head against AutoGen and LangGraph using each framework's native coordination pattern (AutoGen SelectorGroupChat for agent-to-agent chat, LangGraph supervisor-worker for central LLM orchestration). Same task DAG, same model (claude-sonnet-4-6), same agent count.

Topology benchmark

Size Tasks mini AutoGen GroupChat LangGraph supervisor
1 9 4.4 min 12.5 min (4/9 done) 16.9 min
2 17 9.3 min 53.9 min 37.9 min
3 27 14.7 min 105.6 min 98.9 min

The gap grows with project size: ~3× at 9 tasks, ~7× at 27 tasks. AutoGen at size 1 only completed 4 of 9 tasks before its agents tangled in their own conversation. CrewAI is pending (we hit a validation error on multi-level dependency graphs and are seeking the idiomatic pattern from their team).

Full methodology, runners, and reproduction instructions: experiments/topologies/.


Quickstart

Requirements: Python 3.11+, Claude Code CLI, tmux

pip install marcus-mini
export MINI_API_KEY=sk-ant-...   # your Anthropic key — for decomposition only
mini build "a snake game in Python"

Why MINI_API_KEY and not ANTHROPIC_API_KEY? Claude Code agents inherit your shell environment. If ANTHROPIC_API_KEY is set, agents use it for every API call — even if you have a Claude subscription — and you get charged. MINI_API_KEY is only read by the decomposer (one call per mini build). Agents run under your subscription key, not this one.

mini build will:

  1. Call Claude to decompose the goal into a parallel task DAG
  2. Persist all tasks to a local SQLite board
  3. Spawn N agents in a tmux session (N = DAG width)
  4. Each agent loops: claim task → implement → log artifacts → mark done

Watch the board live:

mini watch          # live-refreshing kanban
mini status         # per-agent activity
mini bench          # coordination metrics after completion

Install

pip install marcus-mini

Or from source:

git clone https://github.com/lwgray/marcus-mini
cd marcus-mini
pip install -e .

Commands

Build & monitor

Command Description
mini build "goal" Decompose goal, spawn agents
mini watch Live kanban board (auto-exits on completion)
mini board Static kanban snapshot
mini status Per-agent activity with staleness warnings
mini dag ASCII dependency graph
mini tasks Task list with dependency info
mini progress Completion percentage
mini time Project elapsed time
mini logs Tail agent log files
mini wait Block until all tasks reach DONE/FAILED (scripting-friendly)

Measure

Command Description
mini bench Wall time, utilization, coordination tax

Manage projects

Command Description
mini projects Running projects (--all to see completed too)
mini load Total live agents across all running projects
mini open Print output directory (cd $(mini open))
mini stop Kill the current project's agents (DB + tmux)
mini stop --all Stop every running project (type STOP to confirm)
mini purge Wipe every project from the DB (type DELETE to confirm)
mini config View/set API key env var

Common flags

mini build "goal" --agents 4          # override agent count
mini build "goal" --output-dir ~/myproject
mini watch --interval 5               # refresh every 5s
mini bench --project my-project-1200
mini wait --interval 30 --timeout 5400  # 90-min ceiling

How it works

mini build "snake game"
        │
        ▼
  ┌─────────────┐
  │  decomposer │  Claude call → flat task list + dependency DAG
  └──────┬──────┘
         │
         ▼
  ┌─────────────┐
  │    board    │  SQLite (WAL mode) — the shared environment
  └──────┬──────┘
         │  MCP server exposes 5 tools to each agent
         │
    ┌────┴────┐
    │         │
  agent-1   agent-2   ...   agent-N     (tmux panes)
    │         │
    └─────────┘
    read/write the same board, never each other

Board MCP tools (what agents see)

Tool Purpose
request_next_task Claim next task whose deps are all DONE
log_artifact Store output (API spec, schema, file path)
log_decision Record an architectural choice
get_task_context Read artifacts from dependency tasks
report_done Mark task complete, unblock dependents

Task assignment (atomic SQL)

A task is claimable when status = 'TODO' and every dependency has status = 'DONE'. The claim runs inside BEGIN EXCLUSIVE with a json_each() dependency check — no race conditions, no external lock manager.

Coordination tax

mini bench measures the gap between theoretical and actual parallelism:

agent utilization  =  total task work  /  (n_agents × wall_time)
coordination tax   =  1 − utilization

A tax of 0 % means every agent was always working. Typical software projects land at 30–60 % because of the critical path.


Project structure

marcus-mini/
├── marcus_mini/
│   ├── board.py          # SQLite board + async API
│   ├── board_server.py   # MCP server (5 tools)
│   ├── cli.py            # mini CLI (Click)
│   ├── decomposer.py     # Claude → task DAG
│   ├── models.py         # Task dataclass
│   ├── monitor.py        # tmux monitor pane
│   └── spawn.py          # tmux agent spawner
├── prompts/
│   └── agent_prompt.md   # agent loop instructions
├── tests/
└── pyproject.toml

Design discipline: the Mini Red Line

marcus-mini is intentionally small. Every proposed feature has to pass one test:

"Does coordination break without this?" Yes → allowed. No → it doesn't belong in mini.

What's ruled out, even if it would be useful:

  • Observability beyond mini status (no dashboards, metrics pipelines)
  • Resilience infrastructure (no retry, circuit breakers, fallbacks)
  • Rich configuration (one flat JSON file, period)
  • External integrations (no Slack, GitHub, webhooks)
  • Agent capability management (no skills, tools, specializations)
  • Scheduling (no cron, recurring tasks)

What stays, even if it adds lines of code:

  • Correctness fixes (stall detection, accurate liveness checks)
  • Coordination primitives (task claiming, dependency resolution, spawning)
  • Measurement (mini bench, timing, coordination tax)

If a feature would be at home in Marcus, it doesn't belong in mini.


The evolution: Marcus

marcus-mini proves the concept on one stack: SQLite, Claude, tmux. Three moving parts, ~500 lines of coordination, intentionally bounded so the primitives stay readable.

Marcus takes the same board-mediated primitives and scales them into a production platform: contract-first decomposition (agents agree on APIs/schemas before any code is written), a full observability pipeline, provider abstraction across LLMs and agent runtimes, resilience infrastructure with lease-based recovery, and multi-team isolation. If you find yourself fighting mini's intentional limits, that's the signal to graduate.

Full capability comparison
Capability mini Marcus
Contract-first decomposition — agents agree on APIs/schemas before any code is written, so coordination works in domains without code (legal, scientific, design) flat task DAG only full contract synthesis pre-fork
Observability — structured audit logs, event pub/sub, per-tool duration tracking, agent lifecycle events mini status + mini bench full telemetry pipeline
Resilience — agent retry, circuit breakers, error taxonomy, automatic stall recovery, lease-based work claiming none (fails loud) error framework + lease/recovery layer
Provider abstraction — board protocol independent of which LLM or which agent runtime Claude + Claude Code only multi-provider board protocol
Domain extensibility — coordinate non-software work via contract templates software builds only research, ops, content, analysis
Multi-team / multi-project — concurrent boards, RBAC, shared artifact registry single user, single board team and tenant isolation
Configuration — environments, profiles, per-agent tuning one flat JSON hierarchical config + per-environment overrides
Observability dashboard — real-time network graph, swim lanes, conversation logs, timeline playback, artifact preview (Cato) none live + historical run analysis
Experiment runner — automated multi-agent scaling pipelines, MLflow metric tracking, live WebSocket terminals (Posidonius) none controlled scaling experiments
Project grading — nine-dimension quality audit, runtime smoke tests, agent authorship cohesion, persisted scores (Epictetus Claude skill) none structured quality reports
Scheduling — recurring tasks, cron, triggered runs one build at a time full scheduler

Known limitations

  • Claude-only. The decomposer uses the Anthropic SDK directly, and agents are Claude Code processes. There is no provider abstraction. This is a deliberate scope choice for a research instrument.
  • Single user, single key. No multi-tenancy, no RBAC, no team workflows.
  • Local only. SQLite + tmux on one machine. No remote agents, no cluster.
  • Best effort failure handling. If an agent dies or the API quota runs out, mini stops loud — there is no auto-recovery. By design.

Contributing

See CONTRIBUTING.md. Bug reports and PRs welcome — but new features must pass the Red Line test above.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

marcus_mini-0.1.1.post2.tar.gz (65.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

marcus_mini-0.1.1.post2-py3-none-any.whl (54.8 kB view details)

Uploaded Python 3

File details

Details for the file marcus_mini-0.1.1.post2.tar.gz.

File metadata

  • Download URL: marcus_mini-0.1.1.post2.tar.gz
  • Upload date:
  • Size: 65.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for marcus_mini-0.1.1.post2.tar.gz
Algorithm Hash digest
SHA256 a6eba1e63ead45a2d4f0a6db141870a0cbf62e053e72324b1c8449248a29f0dd
MD5 e854555c94e5006fc319080353185169
BLAKE2b-256 e2cc2f0f806df43d19fce92d5be8d41c11ac6f1fdbb88b4afa2ff95d078ee17b

See more details on using hashes here.

Provenance

The following attestation bundles were made for marcus_mini-0.1.1.post2.tar.gz:

Publisher: publish.yml on lwgray/marcus-mini

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file marcus_mini-0.1.1.post2-py3-none-any.whl.

File metadata

File hashes

Hashes for marcus_mini-0.1.1.post2-py3-none-any.whl
Algorithm Hash digest
SHA256 7a358c67e96c6aad18302d18737ca3d43802bc1d0b8566dc098830be313c8352
MD5 af6c4b138264c43be929d1c26bfa29b0
BLAKE2b-256 010377f7b915d1bd83e063f24749471cbc1a720afbe80a1db750b4f060c3240c

See more details on using hashes here.

Provenance

The following attestation bundles were made for marcus_mini-0.1.1.post2-py3-none-any.whl:

Publisher: publish.yml on lwgray/marcus-mini

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page