Skip to main content

Coordination layer for multi-agent AI systems. Shared memory, experiment tracking, and session replay.

Project description

tracecraft

Coordination layer for multi-agent AI systems. Shared memory, experiment tracking, and session replay.

pip install tracecraft

When you run 10 AI agents in parallel, they need to coordinate. They need to share results, claim tasks, wait for each other, and pass context forward. Today, that infrastructure doesn't exist. Tracecraft is it.

# Agent A finishes a step and shares the result
tracecraft memory set s1a.status "complete"
tracecraft complete S1.A --note "SDK core built. Watch out for race in queue.py"

# Agent B reads it and starts its work
tracecraft wait-for S1.A
tracecraft memory get s1a.status
tracecraft claim S1.B

Works with Claude Code, Codex, CrewAI, LangGraph, Hermes Agent, autoresearch, or any process that can call a CLI.


Why

Andrej Karpathy ran 700 experiments in 2 days with autoresearch. Then his labs got wiped out in an outage. No persistence, no replay, no failover.

He said the next step is "asynchronously massively collaborative agents" — 7,500 people agreed.

claude-peers proved 400K people want agent coordination. But it's ephemeral messaging only — no shared memory, no persistence, no experiment tracking.

Tracecraft is the production infrastructure for both use cases.


What it does

Capability Command What happens
Shared memory tracecraft memory set key value Agents read/write persistent key-value state
Messaging tracecraft send agent-b "done" Direct or broadcast messages between agents
Task claiming tracecraft claim S1.A Atomic step claiming so agents don't collide
Barriers tracecraft wait-for S1.A S1.B Block until dependencies complete
Handoffs tracecraft complete S1.A --note "..." Structured context for the next agent
Experiment tracking tracecraft run start "exp-v2" Track runs, metrics, artifacts, cost
Session replay tracecraft replay <run-id> Step-by-step replay of any past run
Agent registry tracecraft agents See who's online and what they're working on

Quick start

1. Install

pip install tracecraft

2. Start the server

tracecraft serve

This starts PostgreSQL + SeaweedFS + the tracecraft server locally via Docker.

3. Use from any agent

# From a Claude Code session, a Python script, a bash script — anything
tracecraft memory set research.findings '{"papers": 47, "relevant": 12}'
tracecraft send agent-writer "Research phase complete, 12 relevant papers found"
tracecraft run log-metric quality_score 0.92

4. Or use the Python SDK

import tracecraft

tracecraft.init(project="my-research")

with tracecraft.run("prompt-comparison-v2") as run:
    with run.agent(name="researcher", model="claude-sonnet-4-20250514") as agent:
        with agent.step("search", kind="tool_call") as step:
            step.log_input({"query": "multi-agent coordination"})
            step.log_output(results)

        agent.shared_memory.set("findings", {"papers": 47})
        agent.send("writer", "Research complete")

    run.log_metrics({"quality_score": 0.92, "cost": 0.034})

Integrations

Tracecraft works with any agent framework. One-line integrations for the major ones:

# CrewAI
from tracecraft.integrations.crewai import TracecraftCallback
crew = Crew(agents=[...], callbacks=[TracecraftCallback()])

# Claude Agent SDK
from tracecraft.integrations.claude_sdk import tracecraft_hooks
agent = Agent(model="claude-sonnet-4-20250514", hooks=tracecraft_hooks())

# LangGraph
from tracecraft.integrations.langgraph import TracecraftTracer
result = app.invoke(input, config={"callbacks": [TracecraftTracer()]})

# Hermes Agent — coming soon
# AutoGen — coming soon

Architecture

Agents (Claude Code, Codex, CrewAI, scripts, anything)
    |
    |  tracecraft CLI  or  Python SDK
    |
    v
Tracecraft Server (FastAPI)
    |
    +--- PostgreSQL (metadata, coordination state, experiment tracking)
    +--- SeaweedFS (artifacts, memory snapshots, replay files)
    +--- Redis (pub/sub, real-time notifications, locks)

Everything self-hosted. No cloud dependency. One docker compose up to start.


CLI reference

# Server
tracecraft serve                          # Start local server
tracecraft status                         # Check connection

# Shared memory
tracecraft memory set <key> <value>       # Write (JSON or string)
tracecraft memory get <key>               # Read
tracecraft memory list [--prefix X]       # List keys
tracecraft memory watch <pattern>         # Stream changes in real-time

# Messaging
tracecraft send <agent-id> <message>      # Direct message
tracecraft broadcast <message>            # Message all agents
tracecraft inbox                          # Check messages
tracecraft inbox --watch                  # Stream incoming messages

# Coordination
tracecraft claim <step-id>                # Claim a task (atomic)
tracecraft complete <step-id> [--note X]  # Mark done + handoff note
tracecraft wait-for <step-ids...>         # Block until all complete

# Experiment tracking
tracecraft run start <name>               # Start a tracked run
tracecraft run log-metric <name> <value>  # Log a metric
tracecraft run log-artifact <name> <path> # Upload an artifact
tracecraft run end [--status X]           # End the run

# Inspection
tracecraft agents                         # Who's online?
tracecraft runs                           # List all runs
tracecraft runs inspect <id>              # Step-by-step timeline
tracecraft replay <id>                    # Replay a past run

Use cases

Parallel Claude Code sessions

Run 4 Claude Code agents in git worktrees, each building a different module. They claim steps, share artifacts, wait at barriers, and hand off context — all through tracecraft.

Karpathy-style autoresearch

Run hundreds of experiments overnight. Every run is tracked with metrics, artifacts, and cost. Replay any run to understand what the agent tried. Compare runs to find what worked.

CrewAI/LangGraph production monitoring

Track every agent decision, tool call, and handoff in production multi-agent workflows. Debug failures with session replay. Attribute costs to specific agents.

Benchmarking multi-agent systems

Compare different agent configurations (3 agents vs 5 agents, GPT vs Claude, different prompts) with controlled experiment tracking and standardized metrics.


How it compares

tracecraft Langfuse LangSmith claude-peers AgentOps
Open source MIT MIT (ClickHouse) No MIT Partial
Shared memory Yes No No No No
Agent coordination Yes No No Messaging only No
Experiment tracking Yes Tracing only Tracing only No Monitoring
Session replay Yes Trace waterfall Trace waterfall No No
Artifact storage Yes (SeaweedFS) No No No No
CLI-first Yes No No No No
Self-hosted Yes Yes Enterprise only Localhost No
Works with any framework Yes Yes LangChain-centric Claude Code only Yes

Project structure

tracecraft/
  sdk/
    tracecraft/              Python SDK + CLI
      cli/                   CLI commands (click)
      integrations/          CrewAI, Claude SDK, LangGraph adapters
      transport/             Batching, retry, offline buffer
  server/
    tracecraft_server/       FastAPI server
      api/v1/                REST endpoints
      core/                  Config, auth, database
      storage/               SeaweedFS + artifact management
      services/              Shared memory, mailbox, coordination, registry
      models/                SQLAlchemy models
      ws/                    WebSocket handlers
  dashboard/                 Web UI (Phase 3)
  examples/                  Integration examples
  benchmarks/                MAExBench benchmark suite
  plans/                     Construction blueprints for contributors
  docs/                      Documentation

Roadmap

  • Architecture design and blueprints
  • Core SDK (init, run, agent, step)
  • Server (PostgreSQL + SeaweedFS + Redis)
  • Shared memory + messaging + coordination primitives
  • CLI tool
  • CrewAI integration
  • Claude Agent SDK integration
  • LangGraph integration
  • Dashboard with session replay
  • MAExBench (multi-agent experiment benchmark)
  • arXiv paper

Contributing

Tracecraft is built for parallel development. The plans/ directory contains detailed construction blueprints where each step is self-contained — a fresh contributor (human or AI) can pick up any step and execute it independently.

git clone https://github.com/Arrmlet/tracecraft
cd tracecraft
cat plans/tracecraft-blueprint-v2.md    # Read the blueprint

See CONTRIBUTING.md for setup instructions.


Research

Tracecraft is also a research instrument. We're working toward:

  • arXiv preprint: "Tracecraft: Shared Memory and Coordination Primitives for Multi-Agent LLM Systems"
  • MAExBench: A standardized benchmark for evaluating multi-agent coordination, cost efficiency, and reliability
  • NeurIPS/ICLR submission: Empirical findings from real-world multi-agent experiment data

If you're a researcher interested in multi-agent systems, we'd love to collaborate. Open an issue or reach out.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tracecraft_ai-0.1.1.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tracecraft_ai-0.1.1-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file tracecraft_ai-0.1.1.tar.gz.

File metadata

  • Download URL: tracecraft_ai-0.1.1.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for tracecraft_ai-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d4692ec95863fb7597314eb56d294284a103f449507ef5d9c15076d71f4d1c0a
MD5 b4231582b83b9850e0a7197a60b663a6
BLAKE2b-256 2ce0401f7ffb267a7bbc2527b836494fffef323b3e64526490e74268d4961cf3

See more details on using hashes here.

File details

Details for the file tracecraft_ai-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: tracecraft_ai-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 13.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for tracecraft_ai-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fc28590410c233d13a12183777267ab896b42e0a718bc4dd317327fe01c5a672
MD5 d369c2ce0102431f3827bc1b7b441b86
BLAKE2b-256 d03a6cc901021f9150228a8610c49fa985de347f963ac0a54cf2228a23418850

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page