Skip to main content

Record and replay LLM API calls for deterministic testing

Project description

llm-mock

Record real LLM responses once, replay them in tests forever — no API key required, no cost, no non-determinism.

# Record once against the real API (run locally with your API key)
with llm_mock(mode="record", fixture="tests/fixtures/summarize"):
    result = my_pipeline("Summarize this document...")

# Replay in tests — no API key, no cost, deterministic
@pytest.mark.llm_replay(fixture="summarize")
def test_summarize():
    result = my_pipeline("Summarize this document...")
    assert "key points" in result

Why

  • API calls during tests are expensive. A CI run hitting real LLM APIs can cost dollars per run at scale.
  • LLM outputs are non-deterministic. Even at temperature=0, responses can vary across model versions.
  • Your production code stays untouched. llm-mock intercepts at the HTTP transport layer — no changes to application code required.

llm-mock records and replays at the structured request level (model + messages + temperature), stores human-readable JSON fixtures, and integrates natively with pytest.


Installation

# PyPI release coming soon — install from source for now:
git clone https://github.com/autopost/llm-mock.git
cd llm-mock
pip install -e .

Runtime dependencies: httpx, respx, pydantic


How to use

Your production code — untouched

# my_app/pipeline.py
import anthropic

client = anthropic.Anthropic()

def summarize(text: str) -> str:
    message = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=100,
        messages=[{"role": "user", "content": f"Summarize: {text}"}],
    )
    return message.content[0].text

pipeline.py has zero knowledge of llm-mock. No imports, no changes needed.

Step 1 — Record (run once, locally)

Create a small script or a dedicated test that runs with mode="record". You need a real API key for this step.

# record_fixtures.py
from llm_mock import llm_mock
from my_app.pipeline import summarize

with llm_mock(mode="record", fixture="tests/fixtures/summarize"):
    result = summarize("Long article about climate change...")
    print(result)  # real response from the API
ANTHROPIC_API_KEY=sk-... python record_fixtures.py

This creates tests/fixtures/summarize.json. Commit this file to git.

Step 2 — Replay (in tests, forever)

Use the pytest decorator — no with block needed inside the test:

# tests/test_pipeline.py
import pytest
from my_app.pipeline import summarize

@pytest.mark.llm_replay(fixture="summarize")
def test_summarize():
    result = summarize("Long article about climate change...")
    assert "climate" in result
pytest  # no API key needed, runs offline, instant

The decorator auto-discovers the fixture path relative to the test file — fixture="summarize" looks for tests/fixtures/summarize.json when the test lives in tests/.

llm-mock intercepts the httpx call the Anthropic SDK makes internally and returns the saved response — your test code calls summarize() exactly as it would in production.

Alternative: use the context manager directly if you need more control:

from llm_mock import llm_mock

def test_summarize():
    with llm_mock(mode="replay", fixture="tests/fixtures/summarize"):
        result = summarize("Long article about climate change...")
        assert "climate" in result

Step 3 — Re-record when things change

If you change the prompt, update the model, or want to refresh fixtures:

ANTHROPIC_API_KEY=sk-... python record_fixtures.py  # overwrites old fixture
git add tests/fixtures/summarize.json
git commit -m "refresh summarize fixture"

Quick start (direct API usage)

A complete working example from scratch.

1. Install

git clone https://github.com/yourname/llm-mock
cd llm-mock
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

2. Save your API key

echo 'export ANTHROPIC_API_KEY=sk-ant-api03-...' > .env
echo '.env' >> .gitignore

3. Create a record script

Create try_record.py:

import anthropic
from llm_mock import llm_mock

client = anthropic.Anthropic()

with llm_mock(mode="record", fixture="fixtures/hello"):
    message = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=64,
        messages=[{"role": "user", "content": "Say hello in one sentence."}],
    )
    print("Response:", message.content[0].text)
    print("Fixture saved to fixtures/hello.json")

4. Run it

source .env && .venv/bin/python try_record.py

You should see the real response printed and fixtures/hello.json created.

5. Verify the fixture

llm-mock list tests/fixtures/hello

6. Replay without an API key

Create try_replay.py:

import anthropic
from llm_mock import llm_mock

client = anthropic.Anthropic(api_key="fake-key")  # key is irrelevant in replay

with llm_mock(mode="replay", fixture="fixtures/hello"):
    message = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=64,
        messages=[{"role": "user", "content": "Say hello in one sentence."}],
    )
    print("Replayed:", message.content[0].text)
.venv/bin/python try_replay.py

The exact same response is returned instantly — no network call made.

7. Write a test with the pytest decorator

# tests/test_hello.py
import anthropic
import pytest

client = anthropic.Anthropic(api_key="fake-key")

@pytest.mark.llm_replay(fixture="hello")
def test_hello():
    message = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=64,
        messages=[{"role": "user", "content": "Say hello in one sentence."}],
    )
    assert message.content[0].text  # replayed from fixtures/hello.json
.venv/bin/pytest tests/test_hello.py -v

CLI

Inspect and manage fixture files from the terminal.

Note: activate your virtual environment first so llm-mock is on your PATH:

source .venv/bin/activate

Or run it directly with .venv/bin/llm-mock <command>.

llm-mock list <fixture>

Show all recorded interactions in a fixture file:

$ llm-mock list tests/fixtures/summarize

Fixture : tests/fixtures/summarize.json
Provider: anthropic
Interactions: 2

  1. a3f2c1d4e5b6…  claude-sonnet-4-6        2026-04-23T10:00:00
       "Summarize this document about climate change..."
  2. b4g3d2e5f6c7…  claude-haiku-4-5-20251001  2026-04-24T11:00:00
       "What is the capital of France?"

llm-mock clear <fixture>

Delete an entire fixture file:

llm-mock clear tests/fixtures/summarize

Delete a single interaction by hash:

llm-mock clear tests/fixtures/summarize --hash a3f2c1d4e5b6

How it works

Record mode:
  Your code → Anthropic/OpenAI SDK → httpx
    → llm-mock intercepts → forwards to real API
    → saves response to fixture JSON
    → returns response to your code

Replay mode:
  Your code → Anthropic/OpenAI SDK → httpx
    → llm-mock intercepts → looks up fixture by SHA256(model + messages + temperature)
    → returns saved response — no network call made

Request matching uses SHA256 of (model, messages, temperature). Same request always hits the same fixture entry. Different temperature or different message content → different fixture entry.


API reference

llm_mock(mode, fixture, provider="all")

Context manager that activates record or replay mode.

Parameter Type Description
mode "record" | "replay" Whether to hit the real API and save, or return from fixture
fixture str Path to the fixture file. .json extension added automatically if omitted
provider "anthropic" | "openai" | "all" Which provider(s) to intercept. Default: "all"
from llm_mock import llm_mock

with llm_mock(mode="replay", fixture="tests/fixtures/my_test", provider="anthropic"):
    ...

Exceptions

Exception When raised
FixtureNotFoundError Replay mode: fixture file missing, or no matching hash in file
FixtureParseError Fixture file exists but contains invalid JSON
from llm_mock import llm_mock, FixtureNotFoundError

try:
    with llm_mock(mode="replay", fixture="tests/fixtures/missing"):
        client.messages.create(...)
except FixtureNotFoundError as e:
    print(e)  # includes hint to run in record mode first

Fixture file format

Fixture files are plain JSON — readable, diffable, committable.

{
  "version": "1.0",
  "provider": "anthropic",
  "interactions": [
    {
      "hash": "a3f2c1...",
      "request": {
        "model": "claude-sonnet-4-6",
        "messages": [{"role": "user", "content": "Say hello."}],
        "max_tokens": 64
      },
      "response": {
        "id": "msg_01XYZ",
        "type": "message",
        "role": "assistant",
        "content": [{"type": "text", "text": "Hello! How can I help you today?"}],
        "model": "claude-sonnet-4-6",
        "stop_reason": "end_turn",
        "usage": {"input_tokens": 10, "output_tokens": 9}
      },
      "recorded_at": "2026-04-23T10:00:00+00:00"
    }
  ]
}

Multiple interactions (from different requests) are stored in the same file. Re-recording an existing hash overwrites only that entry.


Supported providers

Provider Intercepted endpoint Status
Anthropic api.anthropic.com/v1/messages Supported
OpenAI api.openai.com/v1/chat/completions Supported
Streaming (stream=True) v1.1

Comparison

Tool Record mode Native SDK support In-process
llm-mock yes yes (Anthropic + OpenAI) yes
llm_recorder yes no (LiteLLM only) yes
AIMock no yes no (HTTP server)
vcr-langchain yes no (LangChain only) yes

Development

git clone https://github.com/yourname/llm-mock
cd llm-mock
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest

Roadmap

  • v0.2auto mode, disable via env var (LLM_MOCK_DISABLED)
  • v1.1 — streaming support
  • v2 — shared fixtures for teams, semantic matching, web dashboard

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_mock-0.1.0.tar.gz (16.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_mock-0.1.0-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file llm_mock-0.1.0.tar.gz.

File metadata

  • Download URL: llm_mock-0.1.0.tar.gz
  • Upload date:
  • Size: 16.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for llm_mock-0.1.0.tar.gz
Algorithm Hash digest
SHA256 32d4e33aa5f4b5474f7889bcf4ae10d2d2918e245089be935d981fbbf0ebb9c4
MD5 28a2206df15e7a02240b50c8d6c4e8f0
BLAKE2b-256 527bee5f61abe23b965a954caf6f021d8b6297ad7246ad1a93d8722a5f474e62

See more details on using hashes here.

File details

Details for the file llm_mock-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: llm_mock-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for llm_mock-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d46e38a6e527347ba97f663cf439b6aea989a24b161f58dc57b886ca30a3f5ff
MD5 33e2b5879fc04a208bea5486d980e527
BLAKE2b-256 d29982e90390d9e53881fd90ab65c8880e381b8aff0135d57ec34b6f0a067d7f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page