Record and replay LLM API calls for deterministic testing

Project description

llm-mock

Record real LLM responses once, replay them in tests forever — no API key required, no cost, no non-determinism.

# Record once against the real API (run locally with your API key)
with llm_mock(mode="record", fixture="tests/fixtures/summarize"):
    result = my_pipeline("Summarize this document...")

# Replay in tests — no API key, no cost, deterministic
@pytest.mark.llm_replay(fixture="summarize")
def test_summarize():
    result = my_pipeline("Summarize this document...")
    assert "key points" in result

Why

API calls during tests are expensive. A CI run hitting real LLM APIs can cost dollars per run at scale.
LLM outputs are non-deterministic. Even at temperature=0, responses can vary across model versions.
Your production code stays untouched. llm-mock intercepts at the HTTP transport layer — no changes to application code required.

llm-mock records and replays at the structured request level (model + messages + temperature), stores human-readable JSON fixtures, and integrates natively with pytest.

Installation

pip install llm-mock

Or install from source:

git clone https://github.com/autopost/llm-mock.git
cd llm-mock
pip install -e .

Runtime dependencies: httpx, respx, pydantic

How to use

Your production code — untouched

# my_app/pipeline.py
import anthropic

client = anthropic.Anthropic()

def summarize(text: str) -> str:
    message = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=100,
        messages=[{"role": "user", "content": f"Summarize: {text}"}],
    )
    return message.content[0].text

pipeline.py has zero knowledge of llm-mock. No imports, no changes needed.

Step 1 — Record (run once, locally)

Create a small script or a dedicated test that runs with mode="record". You need a real API key for this step.

# record_fixtures.py
from llm_mock import llm_mock
from my_app.pipeline import summarize

with llm_mock(mode="record", fixture="tests/fixtures/summarize"):
    result = summarize("Long article about climate change...")
    print(result)  # real response from the API

ANTHROPIC_API_KEY=sk-... python record_fixtures.py

This creates tests/fixtures/summarize.json. Commit this file to git.

Step 2 — Replay (in tests, forever)

Use the pytest decorator — no with block needed inside the test:

# tests/test_pipeline.py
import pytest
from my_app.pipeline import summarize

@pytest.mark.llm_replay(fixture="summarize")
def test_summarize():
    result = summarize("Long article about climate change...")
    assert "climate" in result

pytest  # no API key needed, runs offline, instant

The decorator auto-discovers the fixture path relative to the test file — fixture="summarize" looks for tests/fixtures/summarize.json when the test lives in tests/.

llm-mock intercepts the httpx call the Anthropic SDK makes internally and returns the saved response — your test code calls summarize() exactly as it would in production.

Alternative: use the context manager directly if you need more control:

from llm_mock import llm_mock

def test_summarize():
    with llm_mock(mode="replay", fixture="tests/fixtures/summarize"):
        result = summarize("Long article about climate change...")
        assert "climate" in result

Step 3 — Re-record when things change

If you change the prompt, update the model, or want to refresh fixtures:

ANTHROPIC_API_KEY=sk-... python record_fixtures.py  # overwrites old fixture
git add tests/fixtures/summarize.json
git commit -m "refresh summarize fixture"

Quick start (direct API usage)

A complete working example from scratch.

1. Install

pip install llm-mock

2. Save your API key

echo 'export ANTHROPIC_API_KEY=sk-ant-api03-...' > .env
echo '.env' >> .gitignore

3. Create a record script

Create try_record.py:

import anthropic
from llm_mock import llm_mock

client = anthropic.Anthropic()

with llm_mock(mode="record", fixture="fixtures/hello"):
    message = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=64,
        messages=[{"role": "user", "content": "Say hello in one sentence."}],
    )
    print("Response:", message.content[0].text)
    print("Fixture saved to fixtures/hello.json")

4. Run it

source .env && .venv/bin/python try_record.py

You should see the real response printed and fixtures/hello.json created.

5. Verify the fixture

llm-mock list tests/fixtures/hello

6. Replay without an API key

Create try_replay.py:

import anthropic
from llm_mock import llm_mock

client = anthropic.Anthropic(api_key="fake-key")  # key is irrelevant in replay

with llm_mock(mode="replay", fixture="fixtures/hello"):
    message = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=64,
        messages=[{"role": "user", "content": "Say hello in one sentence."}],
    )
    print("Replayed:", message.content[0].text)

.venv/bin/python try_replay.py

The exact same response is returned instantly — no network call made.

7. Write a test with the pytest decorator

# tests/test_hello.py
import anthropic
import pytest

client = anthropic.Anthropic(api_key="fake-key")

@pytest.mark.llm_replay(fixture="hello")
def test_hello():
    message = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=64,
        messages=[{"role": "user", "content": "Say hello in one sentence."}],
    )
    assert message.content[0].text  # replayed from fixtures/hello.json

.venv/bin/pytest tests/test_hello.py -v

CLI

Inspect and manage fixture files from the terminal.

Note: activate your virtual environment first so llm-mock is on your PATH:
source .venv/bin/activate
Or run it directly with .venv/bin/llm-mock <command>.

`llm-mock list <fixture>`

Show all recorded interactions in a fixture file:

$ llm-mock list tests/fixtures/summarize

Fixture : tests/fixtures/summarize.json
Provider: anthropic
Interactions: 2

  1. a3f2c1d4e5b6…  claude-sonnet-4-6        2026-04-23T10:00:00
       "Summarize this document about climate change..."
  2. b4g3d2e5f6c7…  claude-haiku-4-5-20251001  2026-04-24T11:00:00
       "What is the capital of France?"

`llm-mock clear <fixture>`

Delete an entire fixture file:

llm-mock clear tests/fixtures/summarize

Delete a single interaction by hash:

llm-mock clear tests/fixtures/summarize --hash a3f2c1d4e5b6

How it works

Record mode:
  Your code → Anthropic/OpenAI SDK → httpx
    → llm-mock intercepts → forwards to real API
    → saves response to fixture JSON
    → returns response to your code

Replay mode:
  Your code → Anthropic/OpenAI SDK → httpx
    → llm-mock intercepts → looks up fixture by SHA256(model + messages + temperature)
    → returns saved response — no network call made

Request matching uses SHA256 of (model, messages, temperature). Same request always hits the same fixture entry. Different temperature or different message content → different fixture entry.

API reference

`llm_mock(mode, fixture, provider="all")`

Context manager that activates record or replay mode.

Parameter	Type	Description
`mode`	`"record"` \| `"replay"`	Whether to hit the real API and save, or return from fixture
`fixture`	`str`	Path to the fixture file. `.json` extension added automatically if omitted
`provider`	`"anthropic"` \| `"openai"` \| `"all"`	Which provider(s) to intercept. Default: `"all"`

from llm_mock import llm_mock

with llm_mock(mode="replay", fixture="tests/fixtures/my_test", provider="anthropic"):
    ...

Exceptions

Exception	When raised
`FixtureNotFoundError`	Replay mode: fixture file missing, or no matching hash in file
`FixtureParseError`	Fixture file exists but contains invalid JSON

from llm_mock import llm_mock, FixtureNotFoundError

try:
    with llm_mock(mode="replay", fixture="tests/fixtures/missing"):
        client.messages.create(...)
except FixtureNotFoundError as e:
    print(e)  # includes hint to run in record mode first

Fixture file format

Fixture files are plain JSON — readable, diffable, committable.

{
  "version": "1.0",
  "provider": "anthropic",
  "interactions": [
    {
      "hash": "a3f2c1...",
      "request": {
        "model": "claude-sonnet-4-6",
        "messages": [{"role": "user", "content": "Say hello."}],
        "max_tokens": 64
      },
      "response": {
        "id": "msg_01XYZ",
        "type": "message",
        "role": "assistant",
        "content": [{"type": "text", "text": "Hello! How can I help you today?"}],
        "model": "claude-sonnet-4-6",
        "stop_reason": "end_turn",
        "usage": {"input_tokens": 10, "output_tokens": 9}
      },
      "recorded_at": "2026-04-23T10:00:00+00:00"
    }
  ]
}

Multiple interactions (from different requests) are stored in the same file. Re-recording an existing hash overwrites only that entry.

Supported providers

Provider	Intercepted endpoint	Status
Anthropic	`api.anthropic.com/v1/messages`	Supported
OpenAI	`api.openai.com/v1/chat/completions`	Supported
Streaming (`stream=True`)	—	v1.1

Comparison

Tool	Record mode	Native SDK support	In-process
llm-mock	yes	yes (Anthropic + OpenAI)	yes
llm_recorder	yes	no (LiteLLM only)	yes
AIMock	no	yes	no (HTTP server)
vcr-langchain	yes	no (LangChain only)	yes

Development

git clone https://github.com/yourname/llm-mock
cd llm-mock
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest

Roadmap

v0.2 — auto mode, disable via env var (LLM_MOCK_DISABLED)
v1.1 — streaming support
v2 — shared fixtures for teams, semantic matching, web dashboard

Project details

Release history Release notifications | RSS feed

This version

0.1.1

May 21, 2026

0.1.0

May 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_mock-0.1.1.tar.gz (16.2 kB view details)

Uploaded May 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_mock-0.1.1-py3-none-any.whl (12.1 kB view details)

Uploaded May 21, 2026 Python 3

File details

Details for the file llm_mock-0.1.1.tar.gz.

File metadata

Download URL: llm_mock-0.1.1.tar.gz
Upload date: May 21, 2026
Size: 16.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for llm_mock-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`52e2d505ff0e8fc9c778ba57a7d4d149b98ac3bf638ac6db5e74f28fb17b6b1b`
MD5	`5bf6c9a0d3e8fbaadc82ae166d887ea3`
BLAKE2b-256	`4059db26f865e8713eee04c3a99099cde479987d6244469bac679e85cea169d6`

See more details on using hashes here.

File details

Details for the file llm_mock-0.1.1-py3-none-any.whl.

File metadata

Download URL: llm_mock-0.1.1-py3-none-any.whl
Upload date: May 21, 2026
Size: 12.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for llm_mock-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`32cd07088d0a310bec709cfca99a3557a29d366ea5f477fe3dfc676a2ed3a893`
MD5	`b946bb68d30373113f166b54d3fc4820`
BLAKE2b-256	`2f388959abfac369b7eb91e20b692318eac2983764c10f136b13e23bb10b9685`

See more details on using hashes here.

llm-mock 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

llm-mock

Why

Installation

How to use

Your production code — untouched

Step 1 — Record (run once, locally)

Step 2 — Replay (in tests, forever)

Step 3 — Re-record when things change

Quick start (direct API usage)

1. Install

2. Save your API key

3. Create a record script

4. Run it

5. Verify the fixture

6. Replay without an API key

7. Write a test with the pytest decorator

CLI

llm-mock list <fixture>

llm-mock clear <fixture>

How it works

API reference

llm_mock(mode, fixture, provider="all")

Exceptions

Fixture file format

Supported providers

Comparison

Development

Roadmap

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`llm-mock list <fixture>`

`llm-mock clear <fixture>`

`llm_mock(mode, fixture, provider="all")`