Skip to main content

Deterministic replay debugger for LLM agents

Project description

llmreplay

Deterministic replay layer for LLM-driven systems.


Overview

LLM Replay is a lightweight framework for capturing, replaying, and testing LLM interactions.

It converts non-deterministic LLM behavior into reproducible system behavior, enabling reliable debugging and testing.


Problem

LLM applications are difficult to test because they are:

  • Non-deterministic by design
  • Dependent on external APIs
  • Hard to reproduce across runs
  • Fragile in CI environments
  • Difficult to debug historically

This leads to unreliable regression testing and unstable evaluation pipelines.


Solution

LLMReplay introduces a replay abstraction layer for LLM systems.

It enables you to:

  • Capture real LLM executions
  • Store structured interaction traces
  • Replay executions deterministically
  • Remove dependency on live model calls during tests

Features

  • Request/response capture layer
  • Deterministic replay engine
  • Tool-call mocking support
  • Snapshot-based testing workflow
  • CI-safe execution mode
  • Minimal integration overhead

Architecture

LLMReplay operates in two primary modes:

Record Mode

Captures live execution traces from your LLM application, including:

  • Inputs
  • Outputs
  • Tool calls (if applicable)
  • Execution metadata

These traces are persisted for later reuse.


Replay Mode

Replays stored traces without invoking external LLM APIs.

This ensures:

  • Deterministic outputs
  • Fast execution
  • No network dependency
  • Stable CI behavior

Core Workflow

  1. Run your application in record mode
  2. Generate and store interaction traces
  3. Run the same application in replay mode
  4. Validate outputs against recorded snapshots

Use Cases

  • LLM application testing
  • Agent workflow debugging
  • Prompt regression testing
  • Evaluation pipelines
  • CI/CD validation for LLM systems
  • Tool-using agent simulation

Installation

pip install llmreplay

Quick Start

from llmreplay import ReplayClient

client = ReplayClient()

# Record mode
client.record()
run_your_llm_app()

# Replay mode
client.replay()
run_your_llm_app()

Design Principle

If it cannot be replayed, it cannot be tested.


Roadmap

  • Structured trace DAG visualization
  • Multi-model replay support
  • Latency and stochasticity simulation layer
  • Distributed trace collection
  • Web-based replay inspector
  • Plugin system for tool mocking

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmreplay-0.1.0.tar.gz (41.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmreplay-0.1.0-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file llmreplay-0.1.0.tar.gz.

File metadata

  • Download URL: llmreplay-0.1.0.tar.gz
  • Upload date:
  • Size: 41.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for llmreplay-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5e7d660ce25d09bb108d2c1b6276b321fde0b06819e937ee075fbeb5255dd431
MD5 1c63fa82ccc29d2033bcb021f57eca81
BLAKE2b-256 99795cd3cb113f494ead446415d5b50d17734beb3e6e1889f2a7e7dd05c0d895

See more details on using hashes here.

File details

Details for the file llmreplay-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: llmreplay-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 25.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for llmreplay-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3adae71d8f4338c70230ceecf3ae3b3190e719d12e1bdb1e851b01e5c6c2189f
MD5 9095bde997d5d79002901ca7ec1c5420
BLAKE2b-256 49470980276d34dff3d65dd532b2db6dcf9c18d37937c667d7a944f6d5bf4ed8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page