Skip to main content

Deterministic replay debugger for LLM agents

Project description

llmreplay

Deterministic replay debugger for LLM agents. Records LLM calls and tool executions to SQLite, replays from the log with no network calls.

from llmreplay import record, replay

# Record
with record("my_run", seed=42):
    response = client.chat.completions.create(...)

# Replay — zero network calls
session = replay("my_run")
for event in session.events():
    print(event.step, event.kind, event.payload)

Install

pip install llmreplay

Requirements: Python >= 3.10

What gets recorded

  • LLM requests/responses (OpenAI, Anthropic, Grok/xAI, Gemini)
  • Tool calls/results (via @record_tool decorator)
  • Random seeds (Python random, numpy)
  • Exceptions

Events are stored in ~/.llmreplay/<run_id>.db.

CLI

llmreplay list                    # List recorded runs
llmreplay view my_run             # Show all events
llmreplay view my_run --step 42   # Jump to step
llmreplay cost my_run             # Cost breakdown
llmreplay export my_run --json    # Export bug report
llmreplay web my_run              # Launch timeline UI

Features

Auto-instrumentation — OpenAI, Anthropic, Grok, Gemini, LangChain hooks install automatically within record() context.

Tool mocking — Record tool I/O with @record_tool, replay with ToolMocker:

from llmreplay import ToolMocker, EventStore

mocker = ToolMocker()
mocker.load(EventStore("my_run"))

@mocker.mock(name="fetch_price")
def fetch_price(ticker: str) -> dict: ...  # returns recorded result

Regression testing — Run recorded traces against updated code:

from llmreplay import RegressionSuite

suite = RegressionSuite()

@suite.case("run_001")
def check(original, session):
    return session.total_cost() <= original["total_cost_usd"] * 1.1

suite.run()

Fork/branch — Copy a trace up to a step for counterfactual debugging:

from llmreplay import fork
new_store = fork("broken_run", "fixed_run", at_step=50)

Fine-tuning export — Export prompt/response pairs:

from llmreplay import export_finetune_dataset
export_finetune_dataset(["run_001", "run_002"], "data.jsonl")

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmreplay-0.1.1.tar.gz (41.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmreplay-0.1.1-py3-none-any.whl (26.5 kB view details)

Uploaded Python 3

File details

Details for the file llmreplay-0.1.1.tar.gz.

File metadata

  • Download URL: llmreplay-0.1.1.tar.gz
  • Upload date:
  • Size: 41.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for llmreplay-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e8ab588d3e865dcc103a632465b8b457d0d3c1c1c8a6b8a15facadace9a6aeb3
MD5 2c95a7e1753b0e77422b6c4d6f9bacb7
BLAKE2b-256 ed4ab4d19b87fbf12ce9d7e5203eaa89aa59dee5cbcf73f76669e29b0c93fc12

See more details on using hashes here.

File details

Details for the file llmreplay-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: llmreplay-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 26.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for llmreplay-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2f59512538a52483c45845e97c5adef88cd8c7c83d919c6705feebc7e13575a3
MD5 690e486850b28c374f4095ddc617bc09
BLAKE2b-256 67b560fc46b853e89da1482f49aea71085f00a3db3dcd2e708535bcdce0686e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page