Skip to main content

Record, replay, and debug AI agent execution traces

Project description

agent-replay

New here? Start with the Getting Started Guide.

PyPI version Python 3.10+ License: MIT Tests

AI agents are black boxes. agent-replay makes them transparent.

Record every LLM call, tool use, decision point, and state change during agent execution. Replay them step-by-step. Diff two runs to find exactly where behavior diverged.

Features

  • ๐ŸŽฌ Record agent runs with a simple context manager or decorator
  • โฏ๏ธ Replay traces step-by-step in the terminal
  • ๐Ÿ” Diff two traces to find divergence points
  • ๐ŸŒณ Tree view of nested spans and events
  • ๐Ÿ“Š HTML export with a self-contained dark-mode timeline
  • ๐Ÿงฉ Structured traces with spans, events, and metadata
  • โŒจ๏ธ CLI for quick inspection without writing code
  • ๐Ÿ Typed Python 3.10+ with zero heavy dependencies

Architecture

Agent Run โ”€โ”€> Recorder โ”€โ”€> Trace File (.jsonl) โ”€โ”€> Replay Viewer
                                                 โ”€โ”€> Diff Tool
                                                 โ”€โ”€> HTML Export
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Your Agent Code                                            โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚  with Recorder("my-agent") as rec:                    โ”‚  โ”‚
โ”‚  โ”‚      with rec.span("planning"):                       โ”‚  โ”‚
โ”‚  โ”‚          rec.llm_request(model="gpt-4", ...)          โ”‚  โ”‚
โ”‚  โ”‚          rec.llm_response(content="...", tokens=42)   โ”‚  โ”‚
โ”‚  โ”‚      with rec.span("tool-use"):                       โ”‚  โ”‚
โ”‚  โ”‚          rec.tool_call("search", {"q": "..."})        โ”‚  โ”‚
โ”‚  โ”‚          rec.tool_result("search", {...})              โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                           โ”‚
                           โ–ผ
                    trace.jsonl
                           โ”‚
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ–ผ            โ–ผ            โ–ผ
         agent-replay  agent-replay  agent-replay
           show          replay        diff

Quick Start

pip install agent-trace-replay
from agent_replay import Recorder

with Recorder("my-agent", output_path="trace.jsonl") as rec:
    with rec.span("planning"):
        rec.llm_request(model="gpt-4", messages=[{"role": "user", "content": "Hello"}])
        rec.llm_response(content="Hi there!", tokens=5)
    with rec.span("tool-use"):
        rec.tool_call("search", {"query": "python docs"})
        rec.tool_result("search", {"url": "https://docs.python.org"})

Then inspect it:

agent-replay show trace.jsonl
agent-replay show trace.jsonl --tree
agent-replay replay trace.jsonl

Terminal Viewer

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Agent Trace โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ my-agent                            โ”‚
โ”‚ ID: a1b2c3d4e5f67890                โ”‚
โ”‚ Spans: 2 | Events: 4               โ”‚
โ”‚ Duration: 1.234s                    โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

>>> planning (0.523s)
  ๐Ÿง  LLM REQUEST  model=gpt-4 messages=1
  ๐Ÿ’ฌ LLM RESPONSE "Hi there!" (5 tokens)

>>> tool-use (0.711s)
  ๐Ÿ”ง TOOL CALL search({"query": "python docs"})
  ๐Ÿ“ฆ TOOL RESULT search -> {"url": "https://docs.python.org"}

Recording

Context Manager

from agent_replay import Recorder

with Recorder("my-agent", output_path="trace.jsonl") as rec:
    with rec.span("step-1"):
        rec.llm_request(model="gpt-4", messages=[...])
        rec.llm_response(content="...", tokens=10)
        rec.decision("next action", choice="search")
        rec.tool_call("search", {"q": "test"})
        rec.tool_result("search", {"results": [...]})
        rec.state_change("status", old="planning", new="executing")

Decorator

from agent_replay import record_trace, Recorder

@record_trace("my-agent", output_path="trace.jsonl")
def run_agent(task: str, recorder: Recorder = None):
    with recorder.span("work"):
        recorder.llm_request(model="gpt-4")
        recorder.llm_response(content="done")

Event Types

Event Method Description
llm_request rec.llm_request() LLM API call with model and messages
llm_response rec.llm_response() LLM response with content and token count
tool_call rec.tool_call() Tool invocation with name and arguments
tool_result rec.tool_result() Tool return value
decision rec.decision() Agent decision point with chosen action
state_change rec.state_change() State mutation with old/new values
error rec.error() Error with message and exception info
log rec.log() General log message

Replay

Step through traces interactively in the terminal:

agent-replay replay trace.jsonl

Commands during replay:

  • n / next - advance one step
  • p / prev - go back one step
  • j N / jump N - jump to step N
  • q / quit - exit

Programmatic replay:

from agent_replay import ReplayEngine

engine = ReplayEngine.from_file("trace.jsonl")
while engine.has_next():
    span, event = engine.step()
    print(f"[{span.name}] {event.event_type.value}")

Diffing

Compare two traces to find where agent behavior diverged:

agent-replay diff trace_a.jsonl trace_b.jsonl
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Trace Diff โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Trace A: a1b2c3d4                    โ”‚
โ”‚ Trace B: e5f6a7b8                    โ”‚
โ”‚ Found 2 divergence(s): 1 critical,   โ”‚
โ”‚ 1 informational.                     โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Divergences โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ # โ”‚ Severity โ”‚ Pos โ”‚ Description            โ”‚
โ”œโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 1 โ”‚ CRITICAL โ”‚ 3   โ”‚ Different tool called:  โ”‚
โ”‚   โ”‚          โ”‚     โ”‚ search vs browse        โ”‚
โ”‚ 2 โ”‚ INFO     โ”‚ 5   โ”‚ LLM response content   โ”‚
โ”‚   โ”‚          โ”‚     โ”‚ differs                 โ”‚
โ””โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Programmatic diffing:

from agent_replay import Trace, diff_traces

a = Trace.load("trace_a.jsonl")
b = Trace.load("trace_b.jsonl")
result = diff_traces(a, b)

for div in result.divergences:
    print(f"[{div.severity}] Position {div.position}: {div.description}")

HTML Export

Generate a self-contained HTML timeline:

agent-replay export trace.jsonl --format html -o timeline.html

The HTML file uses a dark theme with color-coded event types and expandable data sections. No external dependencies needed to view it.

Configuration

Trace Format

Traces are stored as JSONL files. Each line is a JSON object:

  • Line 1: Trace header (metadata, trace ID, name)
  • Lines 2+: Span records with nested events
{"type": "trace_header", "trace_id": "abc123", "name": "my-agent", ...}
{"type": "span", "name": "planning", "events": [...], ...}
{"type": "span", "name": "tool-use", "events": [...], ...}

Programmatic Access

from agent_replay import Trace

trace = Trace.load("trace.jsonl")
print(f"Spans: {len(trace.spans)}")
print(f"Events: {trace.event_count}")
print(f"Duration: {trace.duration:.3f}s")

for span in trace.spans:
    for event in span.events:
        print(event.event_type, event.data)

Development

git clone https://github.com/manasvardhan/agent-replay.git
cd agent-replay
pip install -e ".[dev]"
pytest

License

MIT License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_trace_replay-0.1.1.tar.gz (17.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_trace_replay-0.1.1-py3-none-any.whl (17.1 kB view details)

Uploaded Python 3

File details

Details for the file agent_trace_replay-0.1.1.tar.gz.

File metadata

  • Download URL: agent_trace_replay-0.1.1.tar.gz
  • Upload date:
  • Size: 17.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for agent_trace_replay-0.1.1.tar.gz
Algorithm Hash digest
SHA256 3466310b8256a1b316159f6f67ef1e5128a4d86237d83325fe702ad81353d791
MD5 6db7ac29d96b6f84188291b14c3d87a8
BLAKE2b-256 484670e58ffc878822812bba7a7f1c51d90ecad39ff04ffbba2cf5e15311b0b3

See more details on using hashes here.

File details

Details for the file agent_trace_replay-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_trace_replay-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bc6bb4e5f0928e82b761b769ccbb2e9c6499965839de73477410aca27a060900
MD5 af6d804f8402601cf45ffb54456eb01c
BLAKE2b-256 27b0017e6790c5a2ba17d00fa44b74b75c4f3ea12cf2e5f6fc4203640c01f7c1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page