Coordination layer for multi-agent AI systems. Shared memory, experiment tracking, and session replay.

These details have not been verified by PyPI

Project links

Project description

tracecraft

Coordination layer for multi-agent AI systems. Shared memory, experiment tracking, and session replay.

pip install tracecraft

When you run 10 AI agents in parallel, they need to coordinate. They need to share results, claim tasks, wait for each other, and pass context forward. Today, that infrastructure doesn't exist. Tracecraft is it.

# Agent A finishes a step and shares the result
tracecraft memory set s1a.status "complete"
tracecraft complete S1.A --note "SDK core built. Watch out for race in queue.py"

# Agent B reads it and starts its work
tracecraft wait-for S1.A
tracecraft memory get s1a.status
tracecraft claim S1.B

Works with Claude Code, Codex, CrewAI, LangGraph, Hermes Agent, autoresearch, or any process that can call a CLI.

Why

Andrej Karpathy ran 700 experiments in 2 days with autoresearch. Then his labs got wiped out in an outage. No persistence, no replay, no failover.

He said the next step is "asynchronously massively collaborative agents" — 7,500 people agreed.

claude-peers proved 400K people want agent coordination. But it's ephemeral messaging only — no shared memory, no persistence, no experiment tracking.

Tracecraft is the production infrastructure for both use cases.

What it does

Capability	Command	What happens
Shared memory	`tracecraft memory set key value`	Agents read/write persistent key-value state
Messaging	`tracecraft send agent-b "done"`	Direct or broadcast messages between agents
Task claiming	`tracecraft claim S1.A`	Atomic step claiming so agents don't collide
Barriers	`tracecraft wait-for S1.A S1.B`	Block until dependencies complete
Handoffs	`tracecraft complete S1.A --note "..."`	Structured context for the next agent
Experiment tracking	`tracecraft run start "exp-v2"`	Track runs, metrics, artifacts, cost
Session replay	`tracecraft replay <run-id>`	Step-by-step replay of any past run
Agent registry	`tracecraft agents`	See who's online and what they're working on

Quick start

1. Install

pip install tracecraft

2. Start the server

tracecraft serve

This starts PostgreSQL + SeaweedFS + the tracecraft server locally via Docker.

3. Use from any agent

# From a Claude Code session, a Python script, a bash script — anything
tracecraft memory set research.findings '{"papers": 47, "relevant": 12}'
tracecraft send agent-writer "Research phase complete, 12 relevant papers found"
tracecraft run log-metric quality_score 0.92

4. Or use the Python SDK

import tracecraft

tracecraft.init(project="my-research")

with tracecraft.run("prompt-comparison-v2") as run:
    with run.agent(name="researcher", model="claude-sonnet-4-20250514") as agent:
        with agent.step("search", kind="tool_call") as step:
            step.log_input({"query": "multi-agent coordination"})
            step.log_output(results)

        agent.shared_memory.set("findings", {"papers": 47})
        agent.send("writer", "Research complete")

    run.log_metrics({"quality_score": 0.92, "cost": 0.034})

Integrations

Tracecraft works with any agent framework. One-line integrations for the major ones:

# CrewAI
from tracecraft.integrations.crewai import TracecraftCallback
crew = Crew(agents=[...], callbacks=[TracecraftCallback()])

# Claude Agent SDK
from tracecraft.integrations.claude_sdk import tracecraft_hooks
agent = Agent(model="claude-sonnet-4-20250514", hooks=tracecraft_hooks())

# LangGraph
from tracecraft.integrations.langgraph import TracecraftTracer
result = app.invoke(input, config={"callbacks": [TracecraftTracer()]})

# Hermes Agent — coming soon
# AutoGen — coming soon

Architecture

Agents (Claude Code, Codex, CrewAI, scripts, anything)
    |
    |  tracecraft CLI  or  Python SDK
    |
    v
Tracecraft Server (FastAPI)
    |
    +--- PostgreSQL (metadata, coordination state, experiment tracking)
    +--- SeaweedFS (artifacts, memory snapshots, replay files)
    +--- Redis (pub/sub, real-time notifications, locks)

Everything self-hosted. No cloud dependency. One docker compose up to start.

CLI reference

# Server
tracecraft serve                          # Start local server
tracecraft status                         # Check connection

# Shared memory
tracecraft memory set <key> <value>       # Write (JSON or string)
tracecraft memory get <key>               # Read
tracecraft memory list [--prefix X]       # List keys
tracecraft memory watch <pattern>         # Stream changes in real-time

# Messaging
tracecraft send <agent-id> <message>      # Direct message
tracecraft broadcast <message>            # Message all agents
tracecraft inbox                          # Check messages
tracecraft inbox --watch                  # Stream incoming messages

# Coordination
tracecraft claim <step-id>                # Claim a task (atomic)
tracecraft complete <step-id> [--note X]  # Mark done + handoff note
tracecraft wait-for <step-ids...>         # Block until all complete

# Experiment tracking
tracecraft run start <name>               # Start a tracked run
tracecraft run log-metric <name> <value>  # Log a metric
tracecraft run log-artifact <name> <path> # Upload an artifact
tracecraft run end [--status X]           # End the run

# Inspection
tracecraft agents                         # Who's online?
tracecraft runs                           # List all runs
tracecraft runs inspect <id>              # Step-by-step timeline
tracecraft replay <id>                    # Replay a past run

Use cases

Parallel Claude Code sessions

Run 4 Claude Code agents in git worktrees, each building a different module. They claim steps, share artifacts, wait at barriers, and hand off context — all through tracecraft.

Karpathy-style autoresearch

Run hundreds of experiments overnight. Every run is tracked with metrics, artifacts, and cost. Replay any run to understand what the agent tried. Compare runs to find what worked.

CrewAI/LangGraph production monitoring

Track every agent decision, tool call, and handoff in production multi-agent workflows. Debug failures with session replay. Attribute costs to specific agents.

Benchmarking multi-agent systems

Compare different agent configurations (3 agents vs 5 agents, GPT vs Claude, different prompts) with controlled experiment tracking and standardized metrics.

How it compares

	tracecraft	Langfuse	LangSmith	claude-peers	AgentOps
Open source	MIT	MIT (ClickHouse)	No	MIT	Partial
Shared memory	Yes	No	No	No	No
Agent coordination	Yes	No	No	Messaging only	No
Experiment tracking	Yes	Tracing only	Tracing only	No	Monitoring
Session replay	Yes	Trace waterfall	Trace waterfall	No	No
Artifact storage	Yes (SeaweedFS)	No	No	No	No
CLI-first	Yes	No	No	No	No
Self-hosted	Yes	Yes	Enterprise only	Localhost	No
Works with any framework	Yes	Yes	LangChain-centric	Claude Code only	Yes

Project structure

tracecraft/
  sdk/
    tracecraft/              Python SDK + CLI
      cli/                   CLI commands (click)
      integrations/          CrewAI, Claude SDK, LangGraph adapters
      transport/             Batching, retry, offline buffer
  server/
    tracecraft_server/       FastAPI server
      api/v1/                REST endpoints
      core/                  Config, auth, database
      storage/               SeaweedFS + artifact management
      services/              Shared memory, mailbox, coordination, registry
      models/                SQLAlchemy models
      ws/                    WebSocket handlers
  dashboard/                 Web UI (Phase 3)
  examples/                  Integration examples
  benchmarks/                MAExBench benchmark suite
  plans/                     Construction blueprints for contributors
  docs/                      Documentation

Roadmap

Architecture design and blueprints
Core SDK (init, run, agent, step)
Server (PostgreSQL + SeaweedFS + Redis)
Shared memory + messaging + coordination primitives
CLI tool
CrewAI integration
Claude Agent SDK integration
LangGraph integration
Dashboard with session replay
MAExBench (multi-agent experiment benchmark)
arXiv paper

Contributing

Tracecraft is built for parallel development. The plans/ directory contains detailed construction blueprints where each step is self-contained — a fresh contributor (human or AI) can pick up any step and execute it independently.

git clone https://github.com/Arrmlet/tracecraft
cd tracecraft
cat plans/tracecraft-blueprint-v2.md    # Read the blueprint

See CONTRIBUTING.md for setup instructions.

Research

Tracecraft is also a research instrument. We're working toward:

arXiv preprint: "Tracecraft: Shared Memory and Coordination Primitives for Multi-Agent LLM Systems"
MAExBench: A standardized benchmark for evaluating multi-agent coordination, cost efficiency, and reliability
NeurIPS/ICLR submission: Empirical findings from real-world multi-agent experiment data

If you're a researcher interested in multi-agent systems, we'd love to collaborate. Open an issue or reach out.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.0

May 29, 2026

0.1.6

May 19, 2026

0.1.5

May 19, 2026

0.1.4

Mar 25, 2026

0.1.3

Mar 25, 2026

0.1.2

Mar 24, 2026

This version

0.1.1

Mar 22, 2026

0.1.0

Mar 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tracecraft_ai-0.1.1.tar.gz (15.5 kB view details)

Uploaded Mar 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tracecraft_ai-0.1.1-py3-none-any.whl (13.8 kB view details)

Uploaded Mar 22, 2026 Python 3

File details

Details for the file tracecraft_ai-0.1.1.tar.gz.

File metadata

Download URL: tracecraft_ai-0.1.1.tar.gz
Upload date: Mar 22, 2026
Size: 15.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for tracecraft_ai-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`d4692ec95863fb7597314eb56d294284a103f449507ef5d9c15076d71f4d1c0a`
MD5	`b4231582b83b9850e0a7197a60b663a6`
BLAKE2b-256	`2ce0401f7ffb267a7bbc2527b836494fffef323b3e64526490e74268d4961cf3`

See more details on using hashes here.

File details

Details for the file tracecraft_ai-0.1.1-py3-none-any.whl.

File metadata

Download URL: tracecraft_ai-0.1.1-py3-none-any.whl
Upload date: Mar 22, 2026
Size: 13.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for tracecraft_ai-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fc28590410c233d13a12183777267ab896b42e0a718bc4dd317327fe01c5a672`
MD5	`d369c2ce0102431f3827bc1b7b441b86`
BLAKE2b-256	`d03a6cc901021f9150228a8610c49fa985de347f963ac0a54cf2228a23418850`

See more details on using hashes here.

tracecraft-ai 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

tracecraft

Why

What it does

Quick start

1. Install

2. Start the server

3. Use from any agent

4. Or use the Python SDK

Integrations

Architecture

CLI reference

Use cases

Parallel Claude Code sessions

Karpathy-style autoresearch

CrewAI/LangGraph production monitoring

Benchmarking multi-agent systems

How it compares

Project structure

Roadmap

Contributing

Research

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes