Skip to main content

Deterministic replay debugger for LLM agents

Project description

llmreplay

Deterministic replay debugger for LLM agents. Records LLM calls and tool executions to SQLite, replays from the log with no network calls.

from llmreplay import record, replay

# Record
with record("my_run", seed=42):
    response = client.chat.completions.create(...)

# Replay — zero network calls
session = replay("my_run")
for event in session.events():
    print(event.step, event.kind, event.payload)

Install

pip install llmreplay

Requirements: Python >= 3.10

What gets recorded

  • LLM requests/responses (OpenAI, Anthropic, Grok/xAI, Gemini)
  • Tool calls/results (via @record_tool decorator)
  • Random seeds (Python random, numpy)
  • Exceptions

Events are stored in ~/.llmreplay/<run_id>.db.

CLI

llmreplay list                    # List recorded runs
llmreplay view my_run             # Show all events
llmreplay view my_run --step 42   # Jump to step
llmreplay cost my_run             # Cost breakdown
llmreplay export my_run --json    # Export bug report
llmreplay web my_run              # Launch timeline UI

Features

Auto-instrumentation — OpenAI, Anthropic, Grok, Gemini, LangChain hooks install automatically within record() context.

Tool mocking — Record tool I/O with @record_tool, replay with ToolMocker:

from llmreplay import ToolMocker, EventStore

mocker = ToolMocker()
mocker.load(EventStore("my_run"))

@mocker.mock(name="fetch_price")
def fetch_price(ticker: str) -> dict: ...  # returns recorded result

Regression testing — Run recorded traces against updated code:

from llmreplay import RegressionSuite

suite = RegressionSuite()

@suite.case("run_001")
def check(original, session):
    return session.total_cost() <= original["total_cost_usd"] * 1.1

suite.run()

Fork/branch — Copy a trace up to a step for counterfactual debugging:

from llmreplay import fork
new_store = fork("broken_run", "fixed_run", at_step=50)

Fine-tuning export — Export prompt/response pairs:

from llmreplay import export_finetune_dataset
export_finetune_dataset(["run_001", "run_002"], "data.jsonl")

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmreplay-0.1.2.tar.gz (41.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmreplay-0.1.2-py3-none-any.whl (26.5 kB view details)

Uploaded Python 3

File details

Details for the file llmreplay-0.1.2.tar.gz.

File metadata

  • Download URL: llmreplay-0.1.2.tar.gz
  • Upload date:
  • Size: 41.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for llmreplay-0.1.2.tar.gz
Algorithm Hash digest
SHA256 d4a8c6bafbaeb2aa431d911c3787e423b42e8cf9911cdac78a089b654ee3d366
MD5 34ee0ae85e19097d51a7089fa280242b
BLAKE2b-256 7a6fbd79286e2a5e479c36920519f1ec61b2dede5aa9258c568bf94748760c21

See more details on using hashes here.

File details

Details for the file llmreplay-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: llmreplay-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 26.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for llmreplay-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 bf16eea1ca094b49e61422cb35a36494fb6f124e9464e0eff2d4061a95a41005
MD5 af1d93c1ba7e95dc2db8dcefcb21a889
BLAKE2b-256 edbb46c23d948fdd46182d838d7f651521a8245c06adb10dc98d80d8ddba3eec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page