Skip to main content

Record and replay LLM API calls for deterministic testing

Project description

llm-mock

Record real LLM responses once, replay them in tests forever — no API key required, no cost, no non-determinism.

# Record once against the real API (run locally with your API key)
with llm_mock(mode="record", fixture="tests/fixtures/summarize"):
    result = my_pipeline("Summarize this document...")

# Replay in tests — no API key, no cost, deterministic
@pytest.mark.llm_replay(fixture="summarize")
def test_summarize():
    result = my_pipeline("Summarize this document...")
    assert "key points" in result

Why

  • API calls during tests are expensive. A CI run hitting real LLM APIs can cost dollars per run at scale.
  • LLM outputs are non-deterministic. Even at temperature=0, responses can vary across model versions.
  • Your production code stays untouched. llm-mock intercepts at the HTTP transport layer — no changes to application code required.

llm-mock records and replays at the structured request level (model + messages + temperature), stores human-readable JSON fixtures, and integrates natively with pytest.


Installation

pip install llm-mock

Or install from source:

git clone https://github.com/autopost/llm-mock.git
cd llm-mock
pip install -e .

Runtime dependencies: httpx, respx, pydantic


How to use

Your production code — untouched

# my_app/pipeline.py
import anthropic

client = anthropic.Anthropic()

def summarize(text: str) -> str:
    message = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=100,
        messages=[{"role": "user", "content": f"Summarize: {text}"}],
    )
    return message.content[0].text

pipeline.py has zero knowledge of llm-mock. No imports, no changes needed.

Step 1 — Record (run once, locally)

Create a small script or a dedicated test that runs with mode="record". You need a real API key for this step.

# record_fixtures.py
from llm_mock import llm_mock
from my_app.pipeline import summarize

with llm_mock(mode="record", fixture="tests/fixtures/summarize"):
    result = summarize("Long article about climate change...")
    print(result)  # real response from the API
ANTHROPIC_API_KEY=sk-... python record_fixtures.py

This creates tests/fixtures/summarize.json. Commit this file to git.

Step 2 — Replay (in tests, forever)

Use the pytest decorator — no with block needed inside the test:

# tests/test_pipeline.py
import pytest
from my_app.pipeline import summarize

@pytest.mark.llm_replay(fixture="summarize")
def test_summarize():
    result = summarize("Long article about climate change...")
    assert "climate" in result
pytest  # no API key needed, runs offline, instant

The decorator auto-discovers the fixture path relative to the test file — fixture="summarize" looks for tests/fixtures/summarize.json when the test lives in tests/.

llm-mock intercepts the httpx call the Anthropic SDK makes internally and returns the saved response — your test code calls summarize() exactly as it would in production.

Alternative: use the context manager directly if you need more control:

from llm_mock import llm_mock

def test_summarize():
    with llm_mock(mode="replay", fixture="tests/fixtures/summarize"):
        result = summarize("Long article about climate change...")
        assert "climate" in result

Step 3 — Re-record when things change

If you change the prompt, update the model, or want to refresh fixtures:

ANTHROPIC_API_KEY=sk-... python record_fixtures.py  # overwrites old fixture
git add tests/fixtures/summarize.json
git commit -m "refresh summarize fixture"

Quick start (direct API usage)

A complete working example from scratch.

1. Install

pip install llm-mock

2. Save your API key

echo 'export ANTHROPIC_API_KEY=sk-ant-api03-...' > .env
echo '.env' >> .gitignore

3. Create a record script

Create try_record.py:

import anthropic
from llm_mock import llm_mock

client = anthropic.Anthropic()

with llm_mock(mode="record", fixture="fixtures/hello"):
    message = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=64,
        messages=[{"role": "user", "content": "Say hello in one sentence."}],
    )
    print("Response:", message.content[0].text)
    print("Fixture saved to fixtures/hello.json")

4. Run it

source .env && .venv/bin/python try_record.py

You should see the real response printed and fixtures/hello.json created.

5. Verify the fixture

llm-mock list tests/fixtures/hello

6. Replay without an API key

Create try_replay.py:

import anthropic
from llm_mock import llm_mock

client = anthropic.Anthropic(api_key="fake-key")  # key is irrelevant in replay

with llm_mock(mode="replay", fixture="fixtures/hello"):
    message = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=64,
        messages=[{"role": "user", "content": "Say hello in one sentence."}],
    )
    print("Replayed:", message.content[0].text)
.venv/bin/python try_replay.py

The exact same response is returned instantly — no network call made.

7. Write a test with the pytest decorator

# tests/test_hello.py
import anthropic
import pytest

client = anthropic.Anthropic(api_key="fake-key")

@pytest.mark.llm_replay(fixture="hello")
def test_hello():
    message = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=64,
        messages=[{"role": "user", "content": "Say hello in one sentence."}],
    )
    assert message.content[0].text  # replayed from fixtures/hello.json
.venv/bin/pytest tests/test_hello.py -v

CLI

Inspect and manage fixture files from the terminal.

Note: activate your virtual environment first so llm-mock is on your PATH:

source .venv/bin/activate

Or run it directly with .venv/bin/llm-mock <command>.

llm-mock list <fixture>

Show all recorded interactions in a fixture file:

$ llm-mock list tests/fixtures/summarize

Fixture : tests/fixtures/summarize.json
Provider: anthropic
Interactions: 2

  1. a3f2c1d4e5b6…  claude-sonnet-4-6        2026-04-23T10:00:00
       "Summarize this document about climate change..."
  2. b4g3d2e5f6c7…  claude-haiku-4-5-20251001  2026-04-24T11:00:00
       "What is the capital of France?"

llm-mock clear <fixture>

Delete an entire fixture file:

llm-mock clear tests/fixtures/summarize

Delete a single interaction by hash:

llm-mock clear tests/fixtures/summarize --hash a3f2c1d4e5b6

How it works

Record mode:
  Your code → Anthropic/OpenAI SDK → httpx
    → llm-mock intercepts → forwards to real API
    → saves response to fixture JSON
    → returns response to your code

Replay mode:
  Your code → Anthropic/OpenAI SDK → httpx
    → llm-mock intercepts → looks up fixture by SHA256(model + messages + temperature)
    → returns saved response — no network call made

Request matching uses SHA256 of (model, messages, temperature). Same request always hits the same fixture entry. Different temperature or different message content → different fixture entry.


API reference

llm_mock(mode, fixture, provider="all")

Context manager that activates record or replay mode.

Parameter Type Description
mode "record" | "replay" Whether to hit the real API and save, or return from fixture
fixture str Path to the fixture file. .json extension added automatically if omitted
provider "anthropic" | "openai" | "all" Which provider(s) to intercept. Default: "all"
from llm_mock import llm_mock

with llm_mock(mode="replay", fixture="tests/fixtures/my_test", provider="anthropic"):
    ...

Exceptions

Exception When raised
FixtureNotFoundError Replay mode: fixture file missing, or no matching hash in file
FixtureParseError Fixture file exists but contains invalid JSON
from llm_mock import llm_mock, FixtureNotFoundError

try:
    with llm_mock(mode="replay", fixture="tests/fixtures/missing"):
        client.messages.create(...)
except FixtureNotFoundError as e:
    print(e)  # includes hint to run in record mode first

Fixture file format

Fixture files are plain JSON — readable, diffable, committable.

{
  "version": "1.0",
  "provider": "anthropic",
  "interactions": [
    {
      "hash": "a3f2c1...",
      "request": {
        "model": "claude-sonnet-4-6",
        "messages": [{"role": "user", "content": "Say hello."}],
        "max_tokens": 64
      },
      "response": {
        "id": "msg_01XYZ",
        "type": "message",
        "role": "assistant",
        "content": [{"type": "text", "text": "Hello! How can I help you today?"}],
        "model": "claude-sonnet-4-6",
        "stop_reason": "end_turn",
        "usage": {"input_tokens": 10, "output_tokens": 9}
      },
      "recorded_at": "2026-04-23T10:00:00+00:00"
    }
  ]
}

Multiple interactions (from different requests) are stored in the same file. Re-recording an existing hash overwrites only that entry.


Supported providers

Provider Intercepted endpoint Status
Anthropic api.anthropic.com/v1/messages Supported
OpenAI api.openai.com/v1/chat/completions Supported
Streaming (stream=True) v1.1

Comparison

Tool Record mode Native SDK support In-process
llm-mock yes yes (Anthropic + OpenAI) yes
llm_recorder yes no (LiteLLM only) yes
AIMock no yes no (HTTP server)
vcr-langchain yes no (LangChain only) yes

Development

git clone https://github.com/yourname/llm-mock
cd llm-mock
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest

Roadmap

  • v0.2auto mode, disable via env var (LLM_MOCK_DISABLED)
  • v1.1 — streaming support
  • v2 — shared fixtures for teams, semantic matching, web dashboard

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_mock-0.1.1.tar.gz (16.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_mock-0.1.1-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file llm_mock-0.1.1.tar.gz.

File metadata

  • Download URL: llm_mock-0.1.1.tar.gz
  • Upload date:
  • Size: 16.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for llm_mock-0.1.1.tar.gz
Algorithm Hash digest
SHA256 52e2d505ff0e8fc9c778ba57a7d4d149b98ac3bf638ac6db5e74f28fb17b6b1b
MD5 5bf6c9a0d3e8fbaadc82ae166d887ea3
BLAKE2b-256 4059db26f865e8713eee04c3a99099cde479987d6244469bac679e85cea169d6

See more details on using hashes here.

File details

Details for the file llm_mock-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: llm_mock-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 12.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for llm_mock-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 32cd07088d0a310bec709cfca99a3557a29d366ea5f477fe3dfc676a2ed3a893
MD5 b946bb68d30373113f166b54d3fc4820
BLAKE2b-256 2f388959abfac369b7eb91e20b692318eac2983764c10f136b13e23bb10b9685

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page