Skip to main content

MCP server for orchestrating parallel Claude agent workloads via Docker containers

Project description

swarm-mcp

Orchestrate parallel Claude agent workloads via Docker containers.

PyPI Python 3.12+ License: MIT Docker Required Documentation

swarm-mcp is an MCP server that lets your Claude session spawn other Claude agents — each in an isolated Docker container — and compose their results using functional combinators. Instead of one agent doing everything, you describe the work as run, par, map, chain, reduce, and pipeline calls, and swarm-mcp handles container lifecycle, resource scheduling, and result plumbing. The agent outputs are lazy refs (metadata on the wire, text on disk) so you never blow up the MCP protocol with megabytes of agent output.


Why swarm-mcp?

  • True Docker isolation per agent. Every agent gets its own container with its own filesystem, network policy, memory limit, and CPU quota. No shared state leaks between agents. A rogue rm -rf / in one container doesn't touch anything else.

  • Monadic ref architecture (lazy by default). Combinators return refs — small JSON objects with metadata (cost, duration, exit code, provenance hash). The actual text stays on disk in /tmp/swarm-mcp/{run_id}/{agent_id}/result.json. Call unwrap when you actually need the content. This keeps the MCP protocol fast and prevents context window bloat.

  • Free monad pipelines (computation as data). The pipeline tool interprets a JSON definition — steps with on_fail handlers, condition guards, retry_if loops, and next jumps. The pipeline is data you can store, version, resume, and share. The interpreter handles control flow, budget tracking, and inter-step communication via a shared /shared/ directory.

  • GPU and resource semaphores. Named resource pools (gpu, database, api, anything) are semaphores with configurable capacity. Agents queue for resources before execution. A single GPU doesn't get double-booked; a rate-limited API doesn't get hammered.

  • Natural language type contracts with validation. Types are markdown files that describe what an agent should produce. The validate tool spawns a validator agent that checks the output against the type definition. filter keeps only results that pass. retry re-runs until the output validates. Types reference each other with [type-name] syntax.

  • Agents can use your MCP servers. Set mcps: ["database-mcp"] in a sandbox spec and the agent gets access to your local knowledge base, Logseq graph, Google Workspace, or any other MCP server configured in your Claude settings. Data paths are mounted into the container at the same host path; network MCPs need no extra config. See the MCP Access guide.

  • Full artifact tracing. Every container runs a PostToolUse hook that logs MCP tool calls and file writes to artifacts.jsonl. inspect generates a post-mortem debug report from any ref. unwrap extracts output to a file you can Read() or Grep. Every ref carries a provenance hash and parent chain.


Architecture

flowchart TB
    Claude["Claude Code<br/>(your session)"] -- "MCP" --> Server["swarm-mcp server<br/>run · par · map · chain · pipeline"]
    Server --> Pools["Semaphore + Resource Pools<br/>SWARM_MAX_CONCURRENT · SWARM_RESOURCE_gpu"]
    Pools --> Docker["Docker API"]
    Docker --> C0["Container 0<br/>claude --model sonnet"]
    Docker --> C1["Container 1<br/>claude --model opus"]
    Docker --> CN["Container N<br/>claude --model haiku"]
    C0 & C1 & CN --> Out["&nbsp;/tmp/swarm-mcp/{run}/{agent}/<br/>result.json · stream.jsonl · artifacts.jsonl"]
    Out -- "unwrap()" --> Text["output.md"]

Ref flow

flowchart LR
    A["run('Review code')"] --> R["ref: a1b2c3/agent-0<br/>exit_code: 0 · cost: $0.03<br/><i>metadata only — no text yet</i>"]
    R -- "unwrap()" --> F["/tmp/.../output.md"]
    R -- "inspect()" --> D["inspect.md<br/>debug report"]
    R -- "reduce([r1,r2,r3])" --> S["synthesis ref"]

Combinator composition

flowchart LR
    P["par(3 tasks)"] --> R3["3 refs"]
    R3 --> Re["reduce('Synthesise')"] --> Final["1 ref"] --> U["unwrap → text"]

    M["map(template, 5 inputs)"] --> R5["5 refs"]
    R5 --> Fi["filter('code-review')"] --> V["valid refs only"]

Installation

Prerequisites

  • Docker — containers are the execution substrate
  • Claude Code CLI — with OAuth configured (claude login)
  • uv — Python package manager

Install

# Run directly
uvx mcp-swarm

# Or install as a tool
uv tool install mcp-swarm

Configure Claude Code

Add to your Claude settings (~/.claude.json or project .claude.json):

{
  "mcpServers": {
    "swarm": {
      "command": "uvx",
      "args": ["mcp-swarm"]
    }
  }
}

Build the Docker image

The agent containers need a Docker image with Claude CLI and uv baked in. Clone the repo and build:

git clone https://github.com/stiege/swarm-mcp
cd swarm-mcp
docker build -t swarm-agent .

The Dockerfile installs Claude Code CLI and uv from their official sources during the build — no binaries to copy. The image is based on Ubuntu 24.04 with git, Python 3, and jq. It auto-builds on first use if missing, but pre-building avoids the startup delay.


Quick Start

1. Single agent — code review

Use run to review this file:

run(
  prompt: "Review the error handling in /workspace/src/api/auth.py. Flag any unhandled exceptions, missing input validation, or security issues. Be specific — line numbers and fix suggestions.",
  model: "sonnet",
  mounts: '[{"host_path": "/home/me/myproject", "container_path": "/workspace", "readonly": true}]',
  tools: "Read,Glob,Grep"
)

Returns a ref with metadata. Use unwrap on the ref to read the full review.

2. Parallel research — three topics at once

Use par to research these topics in parallel:

par(
  tasks: '[
    {"prompt": "Research the current state of WebTransport API browser support. What works, what doesn'\''t, what'\''s coming.", "model": "sonnet"},
    {"prompt": "Research QUIC protocol performance characteristics vs TCP for real-time applications. Include benchmarks if available.", "model": "sonnet"},
    {"prompt": "Research existing open-source WebTransport server implementations. Compare features, maturity, language.", "model": "sonnet"}
  ]',
  max_concurrency: 3
)

Three containers spin up simultaneously. Each gets its own network, filesystem, and execution context. Results come back as an array of refs with a summary showing succeeded/failed counts.

3. Pipeline — write, test, fix loop

Use pipeline to implement and test a feature:

pipeline(
  definition: '{
    "name": "implement-and-test",
    "steps": [
      {
        "id": "implement",
        "prompt": "Implement a rate limiter middleware for Express.js. Use a sliding window algorithm. Write to /shared/rate-limiter.js",
        "model": "sonnet",
        "tools": "Read,Write,Bash"
      },
      {
        "id": "test",
        "prompt": "Write tests for /shared/rate-limiter.js using Jest. Run them. Report pass/fail.",
        "model": "sonnet",
        "tools": "Read,Write,Bash",
        "on_fail": "fix"
      },
      {
        "id": "fix",
        "prompt": "Fix the failing tests. Read the error output and fix either the implementation or the tests.",
        "model": "sonnet",
        "tools": "Read,Write,Bash",
        "condition": "prev.error",
        "next": "test",
        "max_retries": 3
      }
    ]
  }'
)

The pipeline writes code in step 1, tests it in step 2, and if tests fail, loops through the fix step up to 3 times. All steps share the /shared/ directory for file passing.


Combinators Reference

Execution

Combinator Pattern Use when
run 1 prompt → 1 ref Single agent task. The fundamental unit.
par N prompts → N refs Independent tasks that can run simultaneously.
map 1 template + N inputs → N refs Same operation applied to many inputs (fan-out).
chain N prompts → 1 final ref (sequential) Each step needs the previous step's output as context.
reduce N refs + synthesis prompt → 1 ref Combine multiple results into a single synthesis.
map_reduce map + reduce in one call Fan-out then synthesize — no manual plumbing.

Control Flow

Combinator Pattern Use when
filter N refs + type → valid refs Keep only results that match a declared type.
race N prompts → 1 winner ref Multiple strategies, take the first success.
retry 1 prompt + max_attempts → 1 ref Flaky task that may need multiple tries. Optionally validates against a type.
guard 1 ref + check → ref or error Enforce constraints (validated, budget, classification, encrypted, exists) before passing downstream.
pipeline JSON definition → execution trace Multi-step workflow with conditions, retries, on_fail handlers, and budget/deadline tracking.

Observation

Tool Pattern Use when
unwrap ref → file path You need the actual text content. Writes to output.md.
inspect ref → debug report Post-mortem on a failed or slow agent. Shows tool calls, stream log, artifacts.

Security

Tool Pattern Use when
encrypt ref → encrypted ref + key_id Protect sensitive output at rest. Metadata stays readable; text is Fernet-encrypted on disk.
decrypt encrypted ref + key_id → file path Decrypt with the right key. Writes plaintext to output.md.
classify ref + level → classified ref Tag data sensitivity (public/internal/confidential/restricted). Controls which MCPs can access.

Types

Tool Pattern Use when
list_type_registry → list of types See what types are defined.
get_type_definition name → markdown content Read a type definition, with [references] resolved.
validate artifact + type → VALID/PARTIAL/INVALID Check if an agent's output matches a declared type.

Configuration

Tool Pattern Use when
save_sandbox_spec name + JSON → saved Create a reusable sandbox configuration.
list_sandbox_specs → list of specs See saved sandboxes.
wrap file path → ref Bring an external file/directory into the ref system.
wrap_project project dir → registered resources Register a project's pipelines/, sandboxes/, types/ directories.

Sandbox Configuration

A sandbox spec defines the environment for an agent container. Use inline on any combinator, or save with save_sandbox_spec for reuse.

Field Type Default Description
model string "sonnet" Claude model: haiku, sonnet, opus
tools list[string] ["Read","Write","Glob","Grep","Bash"] Allowed Claude tools
mcps list[string] [] MCP servers to attach (by name from host ~/.claude.json). The server's code and config are mounted into the container; add data paths via mounts. See MCP Access.
system_prompt string null System prompt injected via --system-prompt
claude_md string null Written to workspace CLAUDE.md
output_schema dict null JSON schema for structured output (--json-schema)
effort string null Effort level: low, medium, high, max
max_budget float null USD budget cap for the agent
input_type string null Natural language type describing agent input
output_type string null Natural language type describing expected output
mounts list[dict] [] Volume mounts: {"host_path", "container_path", "readonly"}
workdir string "/workspace" Container working directory
input_files dict {} Files to inject: {"/path": "content"}
network bool true Network access (needed for Anthropic API)
memory string null Docker memory limit (e.g. "2g")
cpus float null Docker CPU limit (e.g. 2.0)
gpu bool false Pass --gpus all to Docker
resources list[string] [] Named resource pools to acquire (e.g. ["gpu", "database"])
timeout int 1800 Max execution time in seconds (30 min default)
env_vars dict {} Environment variables: {"KEY": "value"}

Complete example

{
  "model": "sonnet",
  "tools": ["Read", "Write", "Glob", "Grep", "Bash"],
  "system_prompt": "You are a senior backend engineer. Write production-quality Go code.",
  "claude_md": "# Project\nThis is a Go microservice using chi router and pgx for Postgres.",
  "mounts": [
    {"host_path": "/home/me/myservice", "container_path": "/workspace", "readonly": false}
  ],
  "mcps": ["database-mcp"],
  "memory": "4g",
  "cpus": 2.0,
  "timeout": 600,
  "effort": "high",
  "env_vars": {"GOPATH": "/home/ubuntu/go"},
  "output_type": "[go-module]"
}

Save it:

save_sandbox_spec(name: "go-backend", spec: '<the JSON above>')

Then use it anywhere:

run(prompt: "Add pagination to the /users endpoint", sandbox: "go-backend")
par(tasks: '[{"prompt": "...", "sandbox": "go-backend"}, ...]')

Pipelines (Free Monad)

A pipeline definition is a program expressed as data. The pipeline tool is the interpreter. You describe what should happen — steps, control flow, error handling — and the interpreter evaluates it, managing shared state and resource budgets.

Why "free monad"?

Because the pipeline definition is a data structure that describes computation without executing it. You can store it as JSON, version it in git, share it across projects, and resume it from any step. The interpreter handles the effects: spawning containers, tracking costs, enforcing deadlines, and routing control flow.

Complete pipeline example

{
  "name": "research-and-report",
  "budget": 2.00,
  "deadline_seconds": 1800,
  "classification": "internal",
  "steps": [
    {
      "id": "gather",
      "prompt": "Research the top 5 Rust web frameworks by GitHub stars. For each, note: name, stars, last commit date, key features. Write a JSON summary to /shared/frameworks.json",
      "model": "sonnet",
      "tools": "Read,Write,Bash"
    },
    {
      "id": "benchmark",
      "prompt": "Read /shared/frameworks.json. For each framework, find or estimate request throughput benchmarks. Write results to /shared/benchmarks.json",
      "model": "sonnet",
      "tools": "Read,Write,Bash"
    },
    {
      "id": "draft",
      "prompt": "Read /shared/frameworks.json and /shared/benchmarks.json. Write a comparative analysis report to /shared/report.md. Include a recommendation.",
      "model": "opus",
      "tools": "Read,Write",
      "on_fail": "fix-draft"
    },
    {
      "id": "fix-draft",
      "prompt": "The report draft failed. Read the error, fix the issues, and rewrite /shared/report.md",
      "model": "sonnet",
      "tools": "Read,Write",
      "condition": "prev.error",
      "next": "review",
      "max_retries": 2
    },
    {
      "id": "review",
      "prompt": "Read /shared/report.md. Check for factual accuracy, missing data, and unclear recommendations. Write feedback to /shared/review.md. If the report is good, just write 'APPROVED'.",
      "model": "opus",
      "tools": "Read,Write",
      "retry_if": {"draft": "NEEDS_REVISION"}
    }
  ]
}

Step fields

Field Type Description
id string Step identifier (used by on_fail, next, retry_if). Defaults to step-{i}.
prompt string The task prompt. Previous step's output is appended as context automatically.
on_fail string Step ID to jump to if this step fails.
next string Step ID to jump to after success (instead of next sequential step).
condition string "prev.error" — only run this step if the previous step failed.
max_retries int Max times this step can be entered via on_fail/next jumps (default: 3).
retry_if dict {"target_step": "keyword"} — if output contains keyword, jump to target step.
+ any sandbox field model, tools, system_prompt, timeout, etc.

Pipeline-level fields

Field Type Description
name string Pipeline name (optional).
sandbox string Default sandbox spec applied to all steps.
budget float Total USD budget. Pipeline stops if exceeded.
deadline_seconds int Wall-clock deadline. Pipeline stops if exceeded.
classification string Default data classification for the run.

The /shared/ directory

Every step in a pipeline gets /shared/ mounted read-write. This is the inter-step communication channel. Step 1 writes /shared/data.json, step 2 reads it. No ref passing needed — just files.

Resuming

pipeline(definition: "research-and-report", resume: "a1b2c3d4e5f6")
pipeline(definition: "research-and-report", resume: "a1b2c3d4e5f6/benchmark")
  • resume: "run_id" — reuses the shared directory from a previous run, starts from step 0.
  • resume: "run_id/step_id" — skips to the named step, previous artifacts available in /shared/.

Type System

Types are natural language specifications stored as markdown files in a types/ directory. They describe what something is, what it should contain, and how to verify it.

Defining a type

Create types/code-review.md:

A code review document that covers:

1. **Summary** — one paragraph describing what the code does
2. **Issues** — numbered list of problems found, each with:
   - Severity (critical / warning / nit)
   - File and line number
   - Description of the problem
   - Suggested fix
3. **Security** — specific section for security concerns (XSS, injection, auth)
4. **Testing** — assessment of test coverage and suggestions

## Verification
- Has a Summary section
- Has at least one numbered issue with severity, location, and fix
- Has a Security section (even if "no issues found")
- Has a Testing section

Using types

Set input_type or output_type on any combinator to inject type context into the agent's prompt:

run(
  prompt: "Review the auth module",
  output_type: "code-review",
  mounts: '[{"host_path": "/home/me/project", "container_path": "/workspace", "readonly": true}]'
)

Validating

validate(artifact: '{"ref": "a1b2c3/agent-0"}', declared_type: "code-review")

Returns VALID, PARTIAL, or INVALID with per-criterion results.

Type references

Types can reference other types with [type-name] syntax:

# types/api-server.md
A REST API server that includes:
- Route handlers with input validation
- Error handling middleware
- [test-suite] with integration tests
- [dockerfile] for containerized deployment

References are resolved recursively (up to depth 3). Each referenced type is inlined once; subsequent references become "(see above)".

Registering types

wrap_project(project_dir: "/home/me/myproject")

This registers myproject/types/, myproject/sandboxes/, and myproject/pipelines/ as search paths. Types in the project take priority over global types in ~/.claude/types/.


Resource Pools & GPU

GPU access

run(
  prompt: "Fine-tune the sentiment classifier on the new dataset",
  gpu: true,
  mounts: '[{"host_path": "/data/models", "container_path": "/models", "readonly": false}]'
)

Setting gpu: true does two things:

  1. Passes --gpus all to the Docker container
  2. Acquires the "gpu" resource pool (capacity 1 by default)

If another agent is using the GPU, this one queues until the resource is free.

Named resource pools

Any string in the resources array becomes a semaphore. Configure capacity with environment variables:

# One GPU at a time (default)
export SWARM_RESOURCE_gpu=1

# Up to 3 concurrent database connections
export SWARM_RESOURCE_database=3

# Rate-limit external API access to 5 concurrent agents
export SWARM_RESOURCE_api=5

Use in a run call:

run(
  prompt: "Query the production database for user analytics",
  resources: '["database"]',
  mcps: '["database-mcp"]'
)

Queue semantics

Resource acquisition uses a separate timeout (SWARM_QUEUE_TIMEOUT, default 1 hour) from execution timeout. An agent waiting 10 minutes for a GPU still gets its full execution time once the GPU is available. The global concurrency limit (SWARM_MAX_CONCURRENT) is acquired first, then named resources.


Observability

unwrap — extract text to file

unwrap(ref: "a1b2c3/agent-0")

Writes the agent's full text output to /tmp/swarm-mcp/a1b2c3/agent-0/output.md and returns the path and size. Use Read() to view it. This is how you go from a lazy ref to actual content.

inspect — post-mortem debug

inspect(ref: "a1b2c3/agent-0")

Generates a debug report at inspect.md containing:

  • Result metadata (exit code, error, duration, cost)
  • Output text (first 2000 chars)
  • Stream log summary (tool calls made, thinking steps)
  • Artifacts logged by the PostToolUse hook
  • Files in the output directory

Artifact tracing

Every agent container runs a PostToolUse hook (hooks/log-artifacts.sh) that captures MCP tool calls and file writes to /output/artifacts.jsonl. Each entry records:

{
  "timestamp": "2025-01-15T10:30:00Z",
  "tool": "Write",
  "tool_use_id": "toolu_abc123",
  "input": {"file_path": "/workspace/src/main.py", "content": "..."},
  "response": {"success": true}
}

This gives you a complete audit trail of what every agent did inside its container.

Encrypted refs

The encrypt / decrypt flow protects sensitive outputs at rest:

run(prompt: "Extract PII from the uploaded documents")
  │
  ▼
encrypt(ref: "a1b2c3/agent-0")
  │
  ▼
{ "ref": "a1b2c3/agent-0",       ◄── metadata visible
  "key_id": "f9e8d7c6b5a4",      ◄── needed to decrypt
  "encrypted": {                   ◄── text is Fernet-encrypted on disk
    "key_id": "f9e8d7c6b5a4",
    "algorithm": "fernet"
  }}
  │
  ├── unwrap() ──► ERROR: "Ref is encrypted. Use decrypt tool."
  │
  └── decrypt(ref: "a1b2c3/agent-0", key_id: "f9e8d7c6b5a4") ──► output.md

The key is stored in /tmp/swarm-mcp/.keys/ with 0600 permissions. Only processes with the key_id can access the plaintext. The ciphertext stays in result.jsondecrypt writes the plaintext to a separate output.md without replacing the encrypted copy.

Classification flow

classify(ref: "a1b2c3/agent-0", level: "confidential", denied_mcps: '["whatsapp", "slack"]')
  │
  ▼
guard(ref: '<classified ref>', check: "classification", value: '["slack"]')
  │
  ▼
ERROR: "MCP 'slack' denied for classification 'confidential'"

Classification levels: public (0) → internal (1) → confidential (2) → restricted (3). Use guard with the "classification" check to enforce data flow policies before passing refs to downstream agents with MCP access.


On-Disk Layout

Every agent execution produces a directory under /tmp/swarm-mcp/:

/tmp/swarm-mcp/
└── a1b2c3d4e5f6/              ← run_id
    ├── agent-0/                ← agent_id
    │   ├── result.json         ← full output + metadata
    │   ├── stream.jsonl        ← raw stream-json from claude
    │   ├── artifacts.jsonl     ← PostToolUse hook log
    │   ├── output.md           ← created by unwrap()
    │   ├── inspect.md          ← created by inspect()
    │   ├── prompt.txt          ← the prompt sent to the agent
    │   ├── home/               ← staged HOME dir mounted into container
    │   │   ├── .claude/        ← claude config + settings + hooks
    │   │   └── .claude.json    ← oauth + mcp config
    │   └── workspace/          ← mounted as /workspace in container
    │       └── CLAUDE.md       ← injected from sandbox spec
    ├── agent-1/
    │   └── ...
    └── shared/                 ← pipeline shared directory (/shared/ in containers)
        ├── data.json
        └── report.md

Environment Variables

Variable Default Description
SWARM_MAX_CONCURRENT 10 Maximum agents running simultaneously across all combinators.
SWARM_QUEUE_TIMEOUT 3600 Seconds an agent will wait in the queue for an execution slot or resource pool.
SWARM_RESOURCE_<name> 1 Capacity of a named resource pool. e.g. SWARM_RESOURCE_gpu=1, SWARM_RESOURCE_database=3.
SWARM_PROJECT_DIR unset Project root containing pipelines/, sandboxes/, types/ directories. Added to search paths on startup.

Contributing

See CONTRIBUTING.md.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_swarm-0.1.0.tar.gz (113.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_swarm-0.1.0-py3-none-any.whl (54.4 kB view details)

Uploaded Python 3

File details

Details for the file mcp_swarm-0.1.0.tar.gz.

File metadata

  • Download URL: mcp_swarm-0.1.0.tar.gz
  • Upload date:
  • Size: 113.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Linux Mint","version":"22.3","id":"zena","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mcp_swarm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8600a4e85081e128d58b5a3f8444d39fa672a491d0a0d5eba0f0bc627e3e8038
MD5 d20a18991d780bb8f8a6e1a63b5f2667
BLAKE2b-256 fecc12dd5cda72334d939966e2db12eace4ec2df3a2ce01ed592c54386643d76

See more details on using hashes here.

File details

Details for the file mcp_swarm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mcp_swarm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 54.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Linux Mint","version":"22.3","id":"zena","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mcp_swarm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e39325cc8771f97a83fa859153b494451f7da49c9538a38316e5a76b00245644
MD5 d1957cecd7d96fc0206b4fdada375ec4
BLAKE2b-256 b42ba4a17ce4069d60dd4f93690deb3753c72f5a1434fae795317bc45f0c033d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page