Multi-Agent CLI Orchestration Research Platform

Project description

zwarm

Multi-agent CLI orchestration research platform. Coordinate multiple coding agents (Codex, Claude Code) with delegation, conversation, trajectory alignment, and automatic context management.

Key Features

Multi-adapter support: Codex MCP, Claude Code adapters with unified interface
Sync & async modes: Conversational (iterative refinement) or fire-and-forget
Token tracking: Per-session token usage tracked and persisted for cost analysis
Context compaction: Automatic LRU-style pruning when approaching context limits
Trajectory watchers: Composable guardrails (progress, budget, scope, pattern, delegation)
State persistence: Resume sessions, track history, replay events
Weave integration: Full tracing and observability

Installation

# From the workspace (recommended during development)
cd /path/to/labs
uv sync

# Or install directly
uv pip install -e ./zwarm

Requirements:

Python 3.13+
codex CLI installed (for Codex adapter)
claude CLI installed (for Claude Code adapter)

Quick Start

# 1. Initialize zwarm in your project
zwarm init

# 2. Test an executor directly
zwarm exec --task "What is 2+2?"

# 3. Run the orchestrator with a task
zwarm orchestrate --task "Create a hello world Python function"

# 4. Check state after running
zwarm status

# 5. View event history
zwarm history

Task Input Options

# Direct task
zwarm orchestrate --task "Build a REST API"

# From file
zwarm orchestrate --task-file task.md

# From stdin
echo "Fix the bug in auth.py" | zwarm orchestrate

Configuration

zwarm looks for configuration in this order:

--config flag (YAML file)
config.toml in working directory
Default settings

Minimal config.toml

[weave]
enabled = true
project = "your-wandb-entity/zwarm"

[executor]
adapter = "codex_mcp"  # or "claude_code"

Environment Variables

# Weave tracing (optional but recommended)
export WEAVE_PROJECT="your-entity/zwarm"

# Executor authentication (required - set based on which adapter you use)
export OPENAI_API_KEY="sk-..."        # Required for codex_mcp adapter
export ANTHROPIC_API_KEY="sk-ant-..." # Required for claude_code adapter

Important: The orchestrator agent runs with your credentials, but the executor adapters (Codex, Claude Code) need their own authentication. If executors fail with auth errors, check that the appropriate API key is set in your environment.

You can also put these in a .env file in your project root - zwarm will load it automatically.

Full Configuration Reference

# config.yaml
orchestrator:
  lm: gpt-5-mini              # Model for the orchestrator itself
  max_steps: 100              # Maximum orchestrator steps
  compaction:                 # Context window management
    enabled: true
    max_tokens: 100000        # Trigger compaction above this
    threshold_pct: 0.85       # Compact at 85% of max
    target_pct: 0.7           # Target 70% after compaction
    keep_first_n: 2           # Always keep system + task
    keep_last_n: 10           # Always keep recent context

executor:
  adapter: codex_mcp          # Default adapter: codex_mcp | claude_code
  model: null                 # Model override (null = use adapter default)
                              # codex_mcp default: gpt-5.1-codex-mini
                              # claude_code default: claude-sonnet-4-5-20250514
  sandbox: workspace-write    # Codex sandbox mode

weave:
  enabled: true
  project: your-entity/zwarm

state_dir: .zwarm             # State directory for sessions/events

watchers:
  enabled: true
  watchers:
    - name: progress
    - name: budget
      config:
        max_steps: 50
        max_sessions: 10
    - name: scope
      config:
        keywords: []

Adapters

zwarm supports multiple CLI coding agents through adapters. Each adapter wraps a different coding CLI and handles the mechanics of starting sessions, sending messages, and capturing responses.

Codex MCP (default)

Uses Codex via MCP server for true conversational sessions. This is the recommended adapter for iterative work where you need back-and-forth refinement.

# Sync mode (conversational)
zwarm exec --adapter codex_mcp --task "Add a login function"

# The orchestrator can have back-and-forth conversations
# using delegate() and converse() tools

Setting	Value
Default model	`gpt-5.1-codex-mini`
Requires	`codex` CLI installed
Auth	`OPENAI_API_KEY` environment variable

Claude Code

Uses Claude Code CLI for execution. Good alternative when you want Claude's capabilities.

zwarm exec --adapter claude_code --task "Fix the type errors"

Setting	Value
Default model	`claude-sonnet-4-5-20250514`
Requires	`claude` CLI installed and authenticated
Auth	`ANTHROPIC_API_KEY` or `claude` CLI auth

Model Selection

Models are selected with this precedence (highest to lowest):

Per-delegation override: delegate(task="...", model="o3")
Config file: executor.model in config.toml or zwarm.yaml
Adapter default: Each adapter has a sensible default

# config.toml - override the default model
[executor]
adapter = "codex_mcp"
model = "gpt-5.1-codex-max"  # Use the more capable model

# Or override per-execution
zwarm exec --model gpt-5.1-codex-max --task "Complex refactoring"

Watchers (Trajectory Alignment)

Watchers are composable guardrails that monitor agent behavior and can intervene when things go wrong.

Available Watchers

Watcher	Description
`progress`	Detects stuck/spinning agents
`budget`	Monitors step/session limits (counts only active sessions)
`scope`	Detects scope creep from original task
`pattern`	Custom regex pattern matching
`quality`	Code quality checks
`delegation`	Ensures orchestrator delegates instead of writing code directly

Enabling Watchers

# config.yaml
watchers:
  enabled:
    - progress
    - budget
    - scope
  config:
    progress:
      stuck_threshold: 5      # Flag after 5 similar steps
    budget:
      max_steps: 50
      max_sessions: 10
    scope:
      keywords:
        - "refactor"
        - "rewrite"

Watcher Actions

Watchers can return different actions:

continue - Keep going
warn - Log warning but continue
pause - Pause for human review
stop - Stop the orchestrator

Weave Integration

zwarm integrates with Weave for tracing and observability.

Enabling Weave

# Via environment variable
export WEAVE_PROJECT="your-entity/zwarm"

# Or via config.toml
[weave]
enabled = true
project = "your-entity/zwarm"

What Gets Traced

Orchestrator step() calls with tool inputs/outputs
Individual adapter calls (_call_codex, _call_claude)
Delegation tools (delegate, converse, end_session)
All tool executions

View traces at: https://wandb.ai/your-entity/zwarm/weave

CLI Reference

init

Initialize zwarm in a project directory.

zwarm init [OPTIONS]

Options:
  -w, --working-dir PATH    Working directory [default: .]
  -y, --yes                 Accept defaults, no prompts
  --with-project            Also create zwarm.yaml project config

What it creates:

config.toml - User settings (Weave project, adapter preferences, watchers)
.zwarm/ - State directory for sessions and events
zwarm.yaml (optional) - Project-specific task configuration

Examples:

# Interactive setup with prompts
zwarm init

# Non-interactive with defaults
zwarm init --yes

# Create project config too
zwarm init --with-project

# Initialize in a different directory
zwarm init --working-dir /path/to/project

orchestrate

Start an orchestrator session to delegate tasks.

zwarm orchestrate [OPTIONS]

Options:
  -t, --task TEXT           Task description
  -f, --task-file PATH      Read task from file
  -c, --config PATH         Config file (YAML)
  --adapter TEXT            Executor adapter override
  --resume                  Resume from previous state
  --set KEY=VALUE           Override config values

exec

Run a single executor directly (for testing). This bypasses the orchestrator entirely and hits the adapter (Codex/Claude) immediately with your task - useful for verifying adapters work before running full orchestration.

zwarm exec [OPTIONS]

Options:
  -t, --task TEXT           Task to execute
  --adapter TEXT            Adapter to use [default: codex_mcp]
  --model TEXT              Model override
  --mode [sync|async]       Execution mode [default: sync]

Note: Unlike orchestrate, this does NOT use watchers, compaction, state persistence, or multi-step planning. It's a single direct call to the executor.

status

Show current orchestrator state.

zwarm status [OPTIONS]

Options:
  --sessions                Show session details
  --tasks                   Show task details
  --json                    Output as JSON

history

Show event history.

zwarm history [OPTIONS]

Options:
  -n, --limit INTEGER       Number of events [default: 20]
  --session TEXT            Filter by session ID
  --json                    Output as JSON

configs

Manage configuration files.

zwarm configs list          # List available configs
zwarm configs show NAME     # Show config contents

clean

Clean up zwarm state (useful for starting fresh).

zwarm clean [OPTIONS]

Options:
  --all                     Remove everything (events, sessions, state)
  --events                  Remove only events
  --sessions                Remove only sessions
  -y, --yes                 Skip confirmation prompt

Examples:

# Clean everything and start fresh
zwarm clean --all --yes

# Clean only events log
zwarm clean --events

Architecture

┌─────────────────────────────────────────────────────────┐
│                     Orchestrator                         │
│  (Plans, delegates, supervises - does NOT write code)   │
├─────────────────────────────────────────────────────────┤
│                    Delegation Tools                      │
│   delegate() | converse() | check_session() | bash()    │
└───────────────┬─────────────────────┬───────────────────┘
                │                     │
        ┌───────▼───────┐     ┌───────▼───────┐
        │  Codex MCP    │     │  Claude Code  │
        │   Adapter     │     │    Adapter    │
        └───────┬───────┘     └───────┬───────┘
                │                     │
        ┌───────▼───────┐     ┌───────▼───────┐
        │    codex      │     │    claude     │
        │  mcp-server   │     │     CLI       │
        └───────────────┘     └───────────────┘

Key Concepts

Orchestrator: Plans and delegates but never writes code directly
Executors: CLI agents (Codex, Claude) that do the actual coding
Sessions: Conversations with executors (sync or async)
Watchers: Trajectory aligners that monitor and intervene

State Management

All state is stored in flat files under .zwarm/:

.zwarm/
├── state.json              # Current state
├── events.jsonl            # Append-only event log
├── sessions/
│   └── <session-id>/
│       ├── messages.json   # Conversation history
│       └── metadata.json   # Session info
└── orchestrator/
    └── messages.json       # Orchestrator history (for resume)

Development

Running Tests

# Run all zwarm tests (68 tests)
uv run pytest src/zwarm/ -v

# Run specific test modules
uv run pytest src/zwarm/core/test_compact.py -v      # Context compaction
uv run pytest src/zwarm/watchers/test_watchers.py -v # Watchers
uv run pytest src/zwarm/adapters/test_codex_mcp.py -v # Codex adapter

# Run integration tests (requires codex CLI)
uv run pytest -m integration

Project Structure

zwarm/
├── src/zwarm/
│   ├── adapters/           # Executor adapters
│   │   ├── base.py         # ExecutorAdapter protocol
│   │   ├── codex_mcp.py    # Codex MCP adapter (with token tracking)
│   │   └── claude_code.py  # Claude Code adapter (with token tracking)
│   ├── cli/
│   │   └── main.py         # Typer CLI
│   ├── core/
│   │   ├── compact.py      # Context window compaction (LRU pruning)
│   │   ├── config.py       # Configuration loading
│   │   ├── environment.py  # OrchestratorEnv (progress display)
│   │   ├── models.py       # ConversationSession, Message, Event, etc.
│   │   └── state.py        # Flat-file state management
│   ├── tools/
│   │   └── delegation.py   # delegate, converse, check_session, etc.
│   ├── watchers/
│   │   ├── base.py         # Watcher protocol
│   │   ├── builtin.py      # Built-in watchers (progress, budget, scope, etc.)
│   │   ├── registry.py     # Watcher registration
│   │   └── manager.py      # WatcherManager
│   ├── prompts/
│   │   └── orchestrator.py # Orchestrator system prompt
│   └── orchestrator.py     # Main Orchestrator class
├── configs/                # Example configurations
├── README.md
└── pyproject.toml

Research Context

zwarm is a research platform exploring:

Agent reliability - Can orchestrators reliably delegate and verify work?
Agent meta-capability - Can agents effectively use other agents?
Long-running agents - Can agents run for days, not hours?

See ZWARM_PLAN.md for detailed design documentation.

License

Research project - see repository license.

Project details

Release history Release notifications | RSS feed

3.10.6

Feb 6, 2026

3.10.5

Feb 4, 2026

3.10.3

Feb 3, 2026

3.10.2

Feb 3, 2026

3.10.1

Feb 2, 2026

3.9.0

Feb 2, 2026

3.8.0

Jan 29, 2026

3.7.0

Jan 28, 2026

3.6.0

Jan 28, 2026

3.4.0

Jan 27, 2026

3.3.0

Jan 27, 2026

3.2.1

Jan 24, 2026

3.2.0

Jan 24, 2026

3.0.1

Jan 24, 2026

3.0

Jan 24, 2026

2.3.5

Jan 23, 2026

2.3

Jan 22, 2026

2.0.2

Jan 22, 2026

2.0.1

Jan 22, 2026

2.0.0

Jan 22, 2026

1.3.11

Jan 21, 2026

1.3.10

Jan 21, 2026

1.3.9

Jan 21, 2026

1.3.8

Jan 21, 2026

1.3.5

Jan 17, 2026

1.3.3

Jan 17, 2026

1.3.2

Jan 17, 2026

1.2.1

Jan 17, 2026

1.1.1

Jan 16, 2026

1.0.1

Jan 16, 2026

This version

1.0.0

Jan 16, 2026

0.1.0

Jan 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zwarm-1.0.0.tar.gz (59.8 kB view details)

Uploaded Jan 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

zwarm-1.0.0-py3-none-any.whl (72.9 kB view details)

Uploaded Jan 16, 2026 Python 3

File details

Details for the file zwarm-1.0.0.tar.gz.

File metadata

Download URL: zwarm-1.0.0.tar.gz
Upload date: Jan 16, 2026
Size: 59.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for zwarm-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`f06d891f67637d4509e0529f733ee116db4d70a1b8ba0dc6b65eb79d557f465c`
MD5	`0c7754e23368f142fa96d46928b66c74`
BLAKE2b-256	`bbb2ce7c6ef45041adfaf0fbf0efe6319f534631ae2c9f1619a48c62c636d277`

See more details on using hashes here.

File details

Details for the file zwarm-1.0.0-py3-none-any.whl.

File metadata

Download URL: zwarm-1.0.0-py3-none-any.whl
Upload date: Jan 16, 2026
Size: 72.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for zwarm-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`670a0ea377d0eafff71e653f04a965cc1da221fdc331e9bd0368cd0ed21d620c`
MD5	`4a9275b9bb38ffc248cd71f9e5065e4f`
BLAKE2b-256	`68c6c1ff6dbc3852217372410e071b199dbe7388e18fe2945e7ddacee734a827`

See more details on using hashes here.

zwarm 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

zwarm

Key Features

Installation

Quick Start

Task Input Options

Configuration

Minimal config.toml

Environment Variables

Full Configuration Reference

Adapters

Codex MCP (default)

Claude Code

Model Selection

Watchers (Trajectory Alignment)

Available Watchers

Enabling Watchers

Watcher Actions

Weave Integration

Enabling Weave

What Gets Traced

CLI Reference

init

orchestrate

exec

status

history

configs

clean

Architecture

Key Concepts

State Management

Development

Running Tests

Project Structure

Research Context

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes