Deterministic replay debugger for LLM agents
Project description
llmreplay
Deterministic replay layer for LLM-driven systems.
Overview
LLM Replay is a lightweight framework for capturing, replaying, and testing LLM interactions.
It converts non-deterministic LLM behavior into reproducible system behavior, enabling reliable debugging and testing.
Problem
LLM applications are difficult to test because they are:
- Non-deterministic by design
- Dependent on external APIs
- Hard to reproduce across runs
- Fragile in CI environments
- Difficult to debug historically
This leads to unreliable regression testing and unstable evaluation pipelines.
Solution
LLMReplay introduces a replay abstraction layer for LLM systems.
It enables you to:
- Capture real LLM executions
- Store structured interaction traces
- Replay executions deterministically
- Remove dependency on live model calls during tests
Features
- Request/response capture layer
- Deterministic replay engine
- Tool-call mocking support
- Snapshot-based testing workflow
- CI-safe execution mode
- Minimal integration overhead
Architecture
LLMReplay operates in two primary modes:
Record Mode
Captures live execution traces from your LLM application, including:
- Inputs
- Outputs
- Tool calls (if applicable)
- Execution metadata
These traces are persisted for later reuse.
Replay Mode
Replays stored traces without invoking external LLM APIs.
This ensures:
- Deterministic outputs
- Fast execution
- No network dependency
- Stable CI behavior
Core Workflow
- Run your application in record mode
- Generate and store interaction traces
- Run the same application in replay mode
- Validate outputs against recorded snapshots
Use Cases
- LLM application testing
- Agent workflow debugging
- Prompt regression testing
- Evaluation pipelines
- CI/CD validation for LLM systems
- Tool-using agent simulation
Installation
pip install llmreplay
Quick Start
from llmreplay import ReplayClient
client = ReplayClient()
# Record mode
client.record()
run_your_llm_app()
# Replay mode
client.replay()
run_your_llm_app()
Design Principle
If it cannot be replayed, it cannot be tested.
Roadmap
- Structured trace DAG visualization
- Multi-model replay support
- Latency and stochasticity simulation layer
- Distributed trace collection
- Web-based replay inspector
- Plugin system for tool mocking
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmreplay-0.1.0.tar.gz.
File metadata
- Download URL: llmreplay-0.1.0.tar.gz
- Upload date:
- Size: 41.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e7d660ce25d09bb108d2c1b6276b321fde0b06819e937ee075fbeb5255dd431
|
|
| MD5 |
1c63fa82ccc29d2033bcb021f57eca81
|
|
| BLAKE2b-256 |
99795cd3cb113f494ead446415d5b50d17734beb3e6e1889f2a7e7dd05c0d895
|
File details
Details for the file llmreplay-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llmreplay-0.1.0-py3-none-any.whl
- Upload date:
- Size: 25.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3adae71d8f4338c70230ceecf3ae3b3190e719d12e1bdb1e851b01e5c6c2189f
|
|
| MD5 |
9095bde997d5d79002901ca7ec1c5420
|
|
| BLAKE2b-256 |
49470980276d34dff3d65dd532b2db6dcf9c18d37937c667d7a944f6d5bf4ed8
|