Record, replay, and debug AI agent execution traces
Project description
agent-replay
New here? Start with the Getting Started Guide.
AI agents are black boxes. agent-replay makes them transparent.
Record every LLM call, tool use, decision point, and state change during agent execution. Replay them step-by-step. Diff two runs to find exactly where behavior diverged.
Features
- ๐ฌ Record agent runs with a simple context manager or decorator
- โฏ๏ธ Replay traces step-by-step in the terminal
- ๐ Diff two traces to find divergence points
- ๐ณ Tree view of nested spans and events
- ๐ HTML export with a self-contained dark-mode timeline
- ๐งฉ Structured traces with spans, events, and metadata
- โจ๏ธ CLI for quick inspection without writing code
- ๐ Typed Python 3.10+ with zero heavy dependencies
Architecture
Agent Run โโ> Recorder โโ> Trace File (.jsonl) โโ> Replay Viewer
โโ> Diff Tool
โโ> HTML Export
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Your Agent Code โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ with Recorder("my-agent") as rec: โ โ
โ โ with rec.span("planning"): โ โ
โ โ rec.llm_request(model="gpt-4", ...) โ โ
โ โ rec.llm_response(content="...", tokens=42) โ โ
โ โ with rec.span("tool-use"): โ โ
โ โ rec.tool_call("search", {"q": "..."}) โ โ
โ โ rec.tool_result("search", {...}) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
trace.jsonl
โ
โโโโโโโโโโโโโโผโโโโโโโโโโโโโ
โผ โผ โผ
agent-replay agent-replay agent-replay
show replay diff
Quick Start
pip install agent-trace-replay
from agent_replay import Recorder
with Recorder("my-agent", output_path="trace.jsonl") as rec:
with rec.span("planning"):
rec.llm_request(model="gpt-4", messages=[{"role": "user", "content": "Hello"}])
rec.llm_response(content="Hi there!", tokens=5)
with rec.span("tool-use"):
rec.tool_call("search", {"query": "python docs"})
rec.tool_result("search", {"url": "https://docs.python.org"})
Then inspect it:
agent-replay show trace.jsonl
agent-replay show trace.jsonl --tree
agent-replay replay trace.jsonl
Terminal Viewer
โญโโโโโโโโโโโโ Agent Trace โโโโโโโโโโโโโฎ
โ my-agent โ
โ ID: a1b2c3d4e5f67890 โ
โ Spans: 2 | Events: 4 โ
โ Duration: 1.234s โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
>>> planning (0.523s)
๐ง LLM REQUEST model=gpt-4 messages=1
๐ฌ LLM RESPONSE "Hi there!" (5 tokens)
>>> tool-use (0.711s)
๐ง TOOL CALL search({"query": "python docs"})
๐ฆ TOOL RESULT search -> {"url": "https://docs.python.org"}
Recording
Context Manager
from agent_replay import Recorder
with Recorder("my-agent", output_path="trace.jsonl") as rec:
with rec.span("step-1"):
rec.llm_request(model="gpt-4", messages=[...])
rec.llm_response(content="...", tokens=10)
rec.decision("next action", choice="search")
rec.tool_call("search", {"q": "test"})
rec.tool_result("search", {"results": [...]})
rec.state_change("status", old="planning", new="executing")
Decorator
from agent_replay import record_trace, Recorder
@record_trace("my-agent", output_path="trace.jsonl")
def run_agent(task: str, recorder: Recorder = None):
with recorder.span("work"):
recorder.llm_request(model="gpt-4")
recorder.llm_response(content="done")
Event Types
| Event | Method | Description |
|---|---|---|
llm_request |
rec.llm_request() |
LLM API call with model and messages |
llm_response |
rec.llm_response() |
LLM response with content and token count |
tool_call |
rec.tool_call() |
Tool invocation with name and arguments |
tool_result |
rec.tool_result() |
Tool return value |
decision |
rec.decision() |
Agent decision point with chosen action |
state_change |
rec.state_change() |
State mutation with old/new values |
error |
rec.error() |
Error with message and exception info |
log |
rec.log() |
General log message |
Replay
Step through traces interactively in the terminal:
agent-replay replay trace.jsonl
Commands during replay:
n/next- advance one stepp/prev- go back one stepj N/jump N- jump to step Nq/quit- exit
Programmatic replay:
from agent_replay import ReplayEngine
engine = ReplayEngine.from_file("trace.jsonl")
while engine.has_next():
span, event = engine.step()
print(f"[{span.name}] {event.event_type.value}")
Diffing
Compare two traces to find where agent behavior diverged:
agent-replay diff trace_a.jsonl trace_b.jsonl
โญโโโโโโโโโโโโโ Trace Diff โโโโโโโโโโโโโโฎ
โ Trace A: a1b2c3d4 โ
โ Trace B: e5f6a7b8 โ
โ Found 2 divergence(s): 1 critical, โ
โ 1 informational. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โโโโโโโโโโโโโโโโโ Divergences โโโโโโโโโโโโโโโโโ
โ # โ Severity โ Pos โ Description โ
โโโโโผโโโโโโโโโโโผโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 1 โ CRITICAL โ 3 โ Different tool called: โ
โ โ โ โ search vs browse โ
โ 2 โ INFO โ 5 โ LLM response content โ
โ โ โ โ differs โ
โโโโโดโโโโโโโโโโโดโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโ
Programmatic diffing:
from agent_replay import Trace, diff_traces
a = Trace.load("trace_a.jsonl")
b = Trace.load("trace_b.jsonl")
result = diff_traces(a, b)
for div in result.divergences:
print(f"[{div.severity}] Position {div.position}: {div.description}")
HTML Export
Generate a self-contained HTML timeline:
agent-replay export trace.jsonl --format html -o timeline.html
The HTML file uses a dark theme with color-coded event types and expandable data sections. No external dependencies needed to view it.
Configuration
Trace Format
Traces are stored as JSONL files. Each line is a JSON object:
- Line 1: Trace header (metadata, trace ID, name)
- Lines 2+: Span records with nested events
{"type": "trace_header", "trace_id": "abc123", "name": "my-agent", ...}
{"type": "span", "name": "planning", "events": [...], ...}
{"type": "span", "name": "tool-use", "events": [...], ...}
Programmatic Access
from agent_replay import Trace
trace = Trace.load("trace.jsonl")
print(f"Spans: {len(trace.spans)}")
print(f"Events: {trace.event_count}")
print(f"Duration: {trace.duration:.3f}s")
for span in trace.spans:
for event in span.events:
print(event.event_type, event.data)
Development
git clone https://github.com/manasvardhan/agent-replay.git
cd agent-replay
pip install -e ".[dev]"
pytest
License
MIT License. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_trace_replay-0.1.1.tar.gz.
File metadata
- Download URL: agent_trace_replay-0.1.1.tar.gz
- Upload date:
- Size: 17.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3466310b8256a1b316159f6f67ef1e5128a4d86237d83325fe702ad81353d791
|
|
| MD5 |
6db7ac29d96b6f84188291b14c3d87a8
|
|
| BLAKE2b-256 |
484670e58ffc878822812bba7a7f1c51d90ecad39ff04ffbba2cf5e15311b0b3
|
File details
Details for the file agent_trace_replay-0.1.1-py3-none-any.whl.
File metadata
- Download URL: agent_trace_replay-0.1.1-py3-none-any.whl
- Upload date:
- Size: 17.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bc6bb4e5f0928e82b761b769ccbb2e9c6499965839de73477410aca27a060900
|
|
| MD5 |
af6d804f8402601cf45ffb54456eb01c
|
|
| BLAKE2b-256 |
27b0017e6790c5a2ba17d00fa44b74b75c4f3ea12cf2e5f6fc4203640c01f7c1
|