A mock LLM server for testing. Drop-in replacement for OpenAI and Anthropic APIs.

These details have not been verified by PyPI

Project description

fakellm

Run your LLM tests offline. Free, fast, deterministic.

A mock server that speaks the OpenAI and Anthropic APIs. Point your test code at it and your tests stop being slow, expensive, and flaky.

pip install fakellm
fakellm init
fakellm serve

Then in your tests:

import os
os.environ["OPENAI_BASE_URL"] = "http://localhost:9999/v1"
os.environ["ANTHROPIC_BASE_URL"] = "http://localhost:9999"

# your existing code runs unchanged

Why

Three bad options exist for testing LLM code today:

Hit the real API — slow, expensive, flaky.
Mock by hand — brittle, drifts, doesn't exercise streaming or tool-call code paths.
Record-and-replay cassettes — go stale, blow up when prompts change.

fakellm is a fourth option. A local server that returns plausible responses in the right shape, controlled by a small YAML file. Same prompt → same response, every time.

A real example

Say you're building a customer support classifier. Your code calls OpenAI to categorize incoming tickets:

# app/classifier.py
from openai import OpenAI

client = OpenAI()

def classify_ticket(text: str) -> str:
    resp = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "Classify as: billing, technical, or other."},
            {"role": "user", "content": text},
        ],
    )
    return resp.choices[0].message.content.strip().lower()

Without a mock, every test run hits OpenAI — slow, costs money, gives different answers each time. With fakellm, your tests run against a local server that returns deterministic responses based on rules.

Configure the rules in fakellm.yaml:

version: 1

rules:
  - name: billing_keyword
    when:
      messages_contain: "refund"
    respond:
      content: "billing"

  - name: technical_keyword
    when:
      messages_contain: "error"
    respond:
      content: "technical"

  - name: simulate_rate_limit
    when:
      header.x-test-scenario: rate_limit
    respond:
      status: 429
      error: "Rate limit exceeded"

Start the server in the background:

fakellm serve &

Write tests that point at localhost:

# tests/test_classifier.py
import pytest
from openai import RateLimitError, OpenAI
from app.classifier import classify_ticket

@pytest.fixture(autouse=True)
def use_mock_llm(monkeypatch):
    monkeypatch.setenv("OPENAI_BASE_URL", "http://localhost:9999/v1")
    monkeypatch.setenv("OPENAI_API_KEY", "not-needed")

def test_billing_ticket():
    assert classify_ticket("I want a refund for my last order") == "billing"

def test_technical_ticket():
    assert classify_ticket("I keep getting an error when logging in") == "technical"

def test_handles_rate_limit():
    client = OpenAI(
        base_url="http://localhost:9999/v1",
        api_key="not-needed",
        default_headers={"x-test-scenario": "rate_limit"},
    )
    with pytest.raises(RateLimitError):
        client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": "test"}],
        )

Run them:

pytest tests/

Three tests, all deterministic, all free, all in milliseconds. The first two verify your classifier returns the right category for known inputs. The third verifies your code handles rate limits gracefully — a failure mode you can't reliably reproduce against the real API.

Configure

fakellm.yaml:

version: 1

defaults:
  fallback: deterministic_echo

rules:
  - name: greeting
    when:
      messages_contain: "hello"
    respond:
      content: "Hi there!"

  - name: weather_tool
    when:
      tools_include: get_weather
    respond:
      tool_calls:
        - name: get_weather
          arguments: { location: "San Francisco" }

  - name: only_haiku
    when:
      model_matches: "*haiku*"
    respond:
      content: "Short response."

  - name: rate_limit_test
    when:
      header.x-test-scenario: rate_limit
    respond:
      status: 429
      error: "Rate limit exceeded"

Rules are walked top-to-bottom. First match wins. If nothing matches, you get a stable fingerprint response — same input gives the same output, forever.

Use with the OpenAI SDK

from openai import OpenAI

client = OpenAI(base_url="http://localhost:9999/v1", api_key="not-needed")
resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "hello"}],
)
print(resp.choices[0].message.content)  # → "Hi there!"

Use with the Anthropic SDK

from anthropic import Anthropic

client = Anthropic(base_url="http://localhost:9999", api_key="not-needed")
resp = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=100,
    messages=[{"role": "user", "content": "hello"}],
)
print(resp.content[0].text)

Streaming works

Both APIs stream chunks the way the real ones do. Your streaming code paths get exercised.

Simulate failures

Set a header in your test to trigger a specific scenario:

client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[...],
    extra_headers={"x-test-scenario": "rate_limit"},
)
# raises a 429 just like the real API

Dashboard

Visit http://localhost:9999/_fakellm to see which rules are matching and which requests are falling through. Useful for tightening up your config.

What's in v0.1

OpenAI /v1/chat/completions and Anthropic /v1/messages
Streaming for both
Matchers: messages_contain, model_matches, tools_include, header.*
Tool call responses
Error/status code responses
Deterministic fallback
Live dashboard

Roadmap

Multi-turn response sequences for agentic tests
Recorded fixture mode (point at real API, capture, replay)
pytest plugin with inline rule definitions
More matchers (semantic similarity, JSON schema)

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

May 2, 2026

0.1.0

Apr 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fakellm-0.1.1.tar.gz (18.3 kB view details)

Uploaded May 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fakellm-0.1.1-py3-none-any.whl (13.9 kB view details)

Uploaded May 2, 2026 Python 3

File details

Details for the file fakellm-0.1.1.tar.gz.

File metadata

Download URL: fakellm-0.1.1.tar.gz
Upload date: May 2, 2026
Size: 18.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for fakellm-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`dcf7d142d95231a857cc78846b498c58e0f62d3c99c80c9e537f8f4782517ea0`
MD5	`22a5650773ba55f74e1c2050e0d407e9`
BLAKE2b-256	`cd70dff15de34b3d1ea8eb4417b5bd076ae643430b5a0c0b098e2fe2ef0b556b`

See more details on using hashes here.

File details

Details for the file fakellm-0.1.1-py3-none-any.whl.

File metadata

Download URL: fakellm-0.1.1-py3-none-any.whl
Upload date: May 2, 2026
Size: 13.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for fakellm-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fe33de80aefb39e3baf78f46d19229a99fcce35c6b166b844d787acc8893873e`
MD5	`bbe7e058e4e684dd6faf460b5685b448`
BLAKE2b-256	`2da9f78a255554e63f3289a7333fa5316fddec9d977a497264e31c53eaf14f51`

See more details on using hashes here.

fakellm 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

fakellm

Why

A real example

Configure

Use with the OpenAI SDK

Use with the Anthropic SDK

Streaming works

Simulate failures

Dashboard

What's in v0.1

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes