Skip to main content

Record, replay, and debug AI agent execution traces

Project description

agent-replay

New here? Start with the Getting Started Guide.

PyPI version Python 3.10+ License: MIT Tests

AI agents are black boxes. agent-replay makes them transparent.

Record every LLM call, tool use, decision point, and state change during agent execution. Replay them step-by-step. Diff two runs to find exactly where behavior diverged.

Features

  • ๐ŸŽฌ Record agent runs with a simple context manager or decorator
  • โฏ๏ธ Replay traces step-by-step in the terminal
  • ๐Ÿ” Diff two traces to find divergence points
  • ๐ŸŒณ Tree view of nested spans and events
  • ๐Ÿ“Š HTML export with a self-contained dark-mode timeline
  • ๐Ÿงฉ Structured traces with spans, events, and metadata
  • โŒจ๏ธ CLI for quick inspection without writing code
  • ๐Ÿ Typed Python 3.10+ with zero heavy dependencies

Architecture

Agent Run โ”€โ”€> Recorder โ”€โ”€> Trace File (.jsonl) โ”€โ”€> Replay Viewer
                                                 โ”€โ”€> Diff Tool
                                                 โ”€โ”€> HTML Export
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Your Agent Code                                            โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚  with Recorder("my-agent") as rec:                    โ”‚  โ”‚
โ”‚  โ”‚      with rec.span("planning"):                       โ”‚  โ”‚
โ”‚  โ”‚          rec.llm_request(model="gpt-4", ...)          โ”‚  โ”‚
โ”‚  โ”‚          rec.llm_response(content="...", tokens=42)   โ”‚  โ”‚
โ”‚  โ”‚      with rec.span("tool-use"):                       โ”‚  โ”‚
โ”‚  โ”‚          rec.tool_call("search", {"q": "..."})        โ”‚  โ”‚
โ”‚  โ”‚          rec.tool_result("search", {...})              โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                           โ”‚
                           โ–ผ
                    trace.jsonl
                           โ”‚
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ–ผ            โ–ผ            โ–ผ
         agent-replay  agent-replay  agent-replay
           show          replay        diff

Quick Start

pip install agent-replay
from agent_replay import Recorder

with Recorder("my-agent", output_path="trace.jsonl") as rec:
    with rec.span("planning"):
        rec.llm_request(model="gpt-4", messages=[{"role": "user", "content": "Hello"}])
        rec.llm_response(content="Hi there!", tokens=5)
    with rec.span("tool-use"):
        rec.tool_call("search", {"query": "python docs"})
        rec.tool_result("search", {"url": "https://docs.python.org"})

Then inspect it:

agent-replay show trace.jsonl
agent-replay show trace.jsonl --tree
agent-replay replay trace.jsonl

Terminal Viewer

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Agent Trace โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ my-agent                            โ”‚
โ”‚ ID: a1b2c3d4e5f67890                โ”‚
โ”‚ Spans: 2 | Events: 4               โ”‚
โ”‚ Duration: 1.234s                    โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

>>> planning (0.523s)
  ๐Ÿง  LLM REQUEST  model=gpt-4 messages=1
  ๐Ÿ’ฌ LLM RESPONSE "Hi there!" (5 tokens)

>>> tool-use (0.711s)
  ๐Ÿ”ง TOOL CALL search({"query": "python docs"})
  ๐Ÿ“ฆ TOOL RESULT search -> {"url": "https://docs.python.org"}

Recording

Context Manager

from agent_replay import Recorder

with Recorder("my-agent", output_path="trace.jsonl") as rec:
    with rec.span("step-1"):
        rec.llm_request(model="gpt-4", messages=[...])
        rec.llm_response(content="...", tokens=10)
        rec.decision("next action", choice="search")
        rec.tool_call("search", {"q": "test"})
        rec.tool_result("search", {"results": [...]})
        rec.state_change("status", old="planning", new="executing")

Decorator

from agent_replay import record_trace, Recorder

@record_trace("my-agent", output_path="trace.jsonl")
def run_agent(task: str, recorder: Recorder = None):
    with recorder.span("work"):
        recorder.llm_request(model="gpt-4")
        recorder.llm_response(content="done")

Event Types

Event Method Description
llm_request rec.llm_request() LLM API call with model and messages
llm_response rec.llm_response() LLM response with content and token count
tool_call rec.tool_call() Tool invocation with name and arguments
tool_result rec.tool_result() Tool return value
decision rec.decision() Agent decision point with chosen action
state_change rec.state_change() State mutation with old/new values
error rec.error() Error with message and exception info
log rec.log() General log message

Replay

Step through traces interactively in the terminal:

agent-replay replay trace.jsonl

Commands during replay:

  • n / next - advance one step
  • p / prev - go back one step
  • j N / jump N - jump to step N
  • q / quit - exit

Programmatic replay:

from agent_replay import ReplayEngine

engine = ReplayEngine.from_file("trace.jsonl")
while engine.has_next():
    span, event = engine.step()
    print(f"[{span.name}] {event.event_type.value}")

Diffing

Compare two traces to find where agent behavior diverged:

agent-replay diff trace_a.jsonl trace_b.jsonl
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Trace Diff โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Trace A: a1b2c3d4                    โ”‚
โ”‚ Trace B: e5f6a7b8                    โ”‚
โ”‚ Found 2 divergence(s): 1 critical,   โ”‚
โ”‚ 1 informational.                     โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Divergences โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ # โ”‚ Severity โ”‚ Pos โ”‚ Description            โ”‚
โ”œโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 1 โ”‚ CRITICAL โ”‚ 3   โ”‚ Different tool called:  โ”‚
โ”‚   โ”‚          โ”‚     โ”‚ search vs browse        โ”‚
โ”‚ 2 โ”‚ INFO     โ”‚ 5   โ”‚ LLM response content   โ”‚
โ”‚   โ”‚          โ”‚     โ”‚ differs                 โ”‚
โ””โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Programmatic diffing:

from agent_replay import Trace, diff_traces

a = Trace.load("trace_a.jsonl")
b = Trace.load("trace_b.jsonl")
result = diff_traces(a, b)

for div in result.divergences:
    print(f"[{div.severity}] Position {div.position}: {div.description}")

HTML Export

Generate a self-contained HTML timeline:

agent-replay export trace.jsonl --format html -o timeline.html

The HTML file uses a dark theme with color-coded event types and expandable data sections. No external dependencies needed to view it.

Configuration

Trace Format

Traces are stored as JSONL files. Each line is a JSON object:

  • Line 1: Trace header (metadata, trace ID, name)
  • Lines 2+: Span records with nested events
{"type": "trace_header", "trace_id": "abc123", "name": "my-agent", ...}
{"type": "span", "name": "planning", "events": [...], ...}
{"type": "span", "name": "tool-use", "events": [...], ...}

Programmatic Access

from agent_replay import Trace

trace = Trace.load("trace.jsonl")
print(f"Spans: {len(trace.spans)}")
print(f"Events: {trace.event_count}")
print(f"Duration: {trace.duration:.3f}s")

for span in trace.spans:
    for event in span.events:
        print(event.event_type, event.data)

Development

git clone https://github.com/manasvardhan/agent-replay.git
cd agent-replay
pip install -e ".[dev]"
pytest

License

MIT License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_trace_replay-0.1.0.tar.gz (17.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_trace_replay-0.1.0-py3-none-any.whl (17.1 kB view details)

Uploaded Python 3

File details

Details for the file agent_trace_replay-0.1.0.tar.gz.

File metadata

  • Download URL: agent_trace_replay-0.1.0.tar.gz
  • Upload date:
  • Size: 17.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for agent_trace_replay-0.1.0.tar.gz
Algorithm Hash digest
SHA256 65b6e92a6a69aaabac0c83b7257f89c5b9067f1f567344819341c16b184a61f0
MD5 f8e92a392576c8fe071625bb02ad1cae
BLAKE2b-256 a4833309b5227124262009dc3506a2d36e95d12956cf79fc807945f351e7d59f

See more details on using hashes here.

File details

Details for the file agent_trace_replay-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_trace_replay-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 393a63adc94097f167e26eb16c9d80845dd9bc579b05874d9e50fbf1ff852582
MD5 e6e90c1d527577190bb0c1ebc7b0d27f
BLAKE2b-256 1c81589da8adad91fd9371712654026cf57f8041a15d81a87ad4ad8caf0e05b3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page