Record, replay, and debug AI agent execution traces

These details have not been verified by PyPI

Project links

Project description

agent-replay

New here? Start with the Getting Started Guide.

AI agents are black boxes. agent-replay makes them transparent.

Record every LLM call, tool use, decision point, and state change during agent execution. Replay them step-by-step. Diff two runs to find exactly where behavior diverged.

Features

🎬 Record agent runs with a simple context manager or decorator
⏯️ Replay traces step-by-step in the terminal
🔍 Diff two traces to find divergence points
🌳 Tree view of nested spans and events
📊 HTML export with a self-contained dark-mode timeline
🧩 Structured traces with spans, events, and metadata
⌨️ CLI for quick inspection without writing code
🐍 Typed Python 3.10+ with zero heavy dependencies

Architecture

Agent Run ──> Recorder ──> Trace File (.jsonl) ──> Replay Viewer
                                                 ──> Diff Tool
                                                 ──> HTML Export

┌─────────────────────────────────────────────────────────────┐
│  Your Agent Code                                            │
│  ┌───────────────────────────────────────────────────────┐  │
│  │  with Recorder("my-agent") as rec:                    │  │
│  │      with rec.span("planning"):                       │  │
│  │          rec.llm_request(model="gpt-4", ...)          │  │
│  │          rec.llm_response(content="...", tokens=42)   │  │
│  │      with rec.span("tool-use"):                       │  │
│  │          rec.tool_call("search", {"q": "..."})        │  │
│  │          rec.tool_result("search", {...})              │  │
│  └───────────────────────────────────────────────────────┘  │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
                    trace.jsonl
                           │
              ┌────────────┼────────────┐
              ▼            ▼            ▼
         agent-replay  agent-replay  agent-replay
           show          replay        diff

Quick Start

pip install agent-replay

from agent_replay import Recorder

with Recorder("my-agent", output_path="trace.jsonl") as rec:
    with rec.span("planning"):
        rec.llm_request(model="gpt-4", messages=[{"role": "user", "content": "Hello"}])
        rec.llm_response(content="Hi there!", tokens=5)
    with rec.span("tool-use"):
        rec.tool_call("search", {"query": "python docs"})
        rec.tool_result("search", {"url": "https://docs.python.org"})

Then inspect it:

agent-replay show trace.jsonl
agent-replay show trace.jsonl --tree
agent-replay replay trace.jsonl

Terminal Viewer

╭──────────── Agent Trace ────────────╮
│ my-agent                            │
│ ID: a1b2c3d4e5f67890                │
│ Spans: 2 | Events: 4               │
│ Duration: 1.234s                    │
╰─────────────────────────────────────╯

>>> planning (0.523s)
  🧠 LLM REQUEST  model=gpt-4 messages=1
  💬 LLM RESPONSE "Hi there!" (5 tokens)

>>> tool-use (0.711s)
  🔧 TOOL CALL search({"query": "python docs"})
  📦 TOOL RESULT search -> {"url": "https://docs.python.org"}

Recording

Context Manager

from agent_replay import Recorder

with Recorder("my-agent", output_path="trace.jsonl") as rec:
    with rec.span("step-1"):
        rec.llm_request(model="gpt-4", messages=[...])
        rec.llm_response(content="...", tokens=10)
        rec.decision("next action", choice="search")
        rec.tool_call("search", {"q": "test"})
        rec.tool_result("search", {"results": [...]})
        rec.state_change("status", old="planning", new="executing")

Decorator

from agent_replay import record_trace, Recorder

@record_trace("my-agent", output_path="trace.jsonl")
def run_agent(task: str, recorder: Recorder = None):
    with recorder.span("work"):
        recorder.llm_request(model="gpt-4")
        recorder.llm_response(content="done")

Event Types

Event	Method	Description
`llm_request`	`rec.llm_request()`	LLM API call with model and messages
`llm_response`	`rec.llm_response()`	LLM response with content and token count
`tool_call`	`rec.tool_call()`	Tool invocation with name and arguments
`tool_result`	`rec.tool_result()`	Tool return value
`decision`	`rec.decision()`	Agent decision point with chosen action
`state_change`	`rec.state_change()`	State mutation with old/new values
`error`	`rec.error()`	Error with message and exception info
`log`	`rec.log()`	General log message

Replay

Step through traces interactively in the terminal:

agent-replay replay trace.jsonl

Commands during replay:

n / next - advance one step
p / prev - go back one step
j N / jump N - jump to step N
q / quit - exit

Programmatic replay:

from agent_replay import ReplayEngine

engine = ReplayEngine.from_file("trace.jsonl")
while engine.has_next():
    span, event = engine.step()
    print(f"[{span.name}] {event.event_type.value}")

Diffing

Compare two traces to find where agent behavior diverged:

agent-replay diff trace_a.jsonl trace_b.jsonl

╭───────────── Trace Diff ─────────────╮
│ Trace A: a1b2c3d4                    │
│ Trace B: e5f6a7b8                    │
│ Found 2 divergence(s): 1 critical,   │
│ 1 informational.                     │
╰──────────────────────────────────────╯
┌──────────────── Divergences ────────────────┐
│ # │ Severity │ Pos │ Description            │
├───┼──────────┼─────┼────────────────────────┤
│ 1 │ CRITICAL │ 3   │ Different tool called:  │
│   │          │     │ search vs browse        │
│ 2 │ INFO     │ 5   │ LLM response content   │
│   │          │     │ differs                 │
└───┴──────────┴─────┴────────────────────────┘

Programmatic diffing:

from agent_replay import Trace, diff_traces

a = Trace.load("trace_a.jsonl")
b = Trace.load("trace_b.jsonl")
result = diff_traces(a, b)

for div in result.divergences:
    print(f"[{div.severity}] Position {div.position}: {div.description}")

HTML Export

Generate a self-contained HTML timeline:

agent-replay export trace.jsonl --format html -o timeline.html

The HTML file uses a dark theme with color-coded event types and expandable data sections. No external dependencies needed to view it.

Configuration

Trace Format

Traces are stored as JSONL files. Each line is a JSON object:

Line 1: Trace header (metadata, trace ID, name)
Lines 2+: Span records with nested events

{"type": "trace_header", "trace_id": "abc123", "name": "my-agent", ...}
{"type": "span", "name": "planning", "events": [...], ...}
{"type": "span", "name": "tool-use", "events": [...], ...}

Programmatic Access

from agent_replay import Trace

trace = Trace.load("trace.jsonl")
print(f"Spans: {len(trace.spans)}")
print(f"Events: {trace.event_count}")
print(f"Duration: {trace.duration:.3f}s")

for span in trace.spans:
    for event in span.events:
        print(event.event_type, event.data)

Development

git clone https://github.com/manasvardhan/agent-replay.git
cd agent-replay
pip install -e ".[dev]"
pytest

License

MIT License. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

Feb 17, 2026

This version

0.1.0

Feb 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_trace_replay-0.1.0.tar.gz (17.6 kB view details)

Uploaded Feb 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agent_trace_replay-0.1.0-py3-none-any.whl (17.1 kB view details)

Uploaded Feb 17, 2026 Python 3

File details

Details for the file agent_trace_replay-0.1.0.tar.gz.

File metadata

Download URL: agent_trace_replay-0.1.0.tar.gz
Upload date: Feb 17, 2026
Size: 17.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for agent_trace_replay-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`65b6e92a6a69aaabac0c83b7257f89c5b9067f1f567344819341c16b184a61f0`
MD5	`f8e92a392576c8fe071625bb02ad1cae`
BLAKE2b-256	`a4833309b5227124262009dc3506a2d36e95d12956cf79fc807945f351e7d59f`

See more details on using hashes here.

File details

Details for the file agent_trace_replay-0.1.0-py3-none-any.whl.

File metadata

Download URL: agent_trace_replay-0.1.0-py3-none-any.whl
Upload date: Feb 17, 2026
Size: 17.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for agent_trace_replay-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`393a63adc94097f167e26eb16c9d80845dd9bc579b05874d9e50fbf1ff852582`
MD5	`e6e90c1d527577190bb0c1ebc7b0d27f`
BLAKE2b-256	`1c81589da8adad91fd9371712654026cf57f8041a15d81a87ad4ad8caf0e05b3`

See more details on using hashes here.

agent-trace-replay 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

agent-replay

Features

Architecture

Quick Start

Terminal Viewer

Recording

Context Manager

Decorator

Event Types

Replay

Diffing

HTML Export

Configuration

Trace Format

Programmatic Access

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes