Skip to main content

A mock LLM server for testing. Drop-in replacement for OpenAI and Anthropic APIs.

Project description

fakellm

Run your LLM tests offline. Free, fast, deterministic.

A mock server that speaks the OpenAI and Anthropic APIs. Point your test code at it and your tests stop being slow, expensive, and flaky.

pip install fakellm
fakellm init
fakellm serve

Then in your tests:

import os
os.environ["OPENAI_BASE_URL"] = "http://localhost:9999/v1"
os.environ["ANTHROPIC_BASE_URL"] = "http://localhost:9999"

# your existing code runs unchanged

Why

Three bad options exist for testing LLM code today:

  1. Hit the real API — slow, expensive, flaky.
  2. Mock by hand — brittle, drifts, doesn't exercise streaming or tool-call code paths.
  3. Record-and-replay cassettes — go stale, blow up when prompts change.

fakellm is a fourth option. A local server that returns plausible responses in the right shape, controlled by a small YAML file. Same prompt → same response, every time.

Configure

fakellm.yaml:

version: 1

defaults:
  fallback: deterministic_echo

rules:
  - name: greeting
    when:
      messages_contain: "hello"
    respond:
      content: "Hi there!"

  - name: weather_tool
    when:
      tools_include: get_weather
    respond:
      tool_calls:
        - name: get_weather
          arguments: { location: "San Francisco" }

  - name: only_haiku
    when:
      model_matches: "*haiku*"
    respond:
      content: "Short response."

  - name: rate_limit_test
    when:
      header.x-test-scenario: rate_limit
    respond:
      status: 429
      error: "Rate limit exceeded"

Rules are walked top-to-bottom. First match wins. If nothing matches, you get a stable fingerprint response — same input gives the same output, forever.

Use with the OpenAI SDK

from openai import OpenAI

client = OpenAI(base_url="http://localhost:9999/v1", api_key="not-needed")
resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "hello"}],
)
print(resp.choices[0].message.content)  # → "Hi there!"

Use with the Anthropic SDK

from anthropic import Anthropic

client = Anthropic(base_url="http://localhost:9999", api_key="not-needed")
resp = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=100,
    messages=[{"role": "user", "content": "hello"}],
)
print(resp.content[0].text)

Streaming works

Both APIs stream chunks the way the real ones do. Your streaming code paths get exercised.

Simulate failures

Set a header in your test to trigger a specific scenario:

client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[...],
    extra_headers={"x-test-scenario": "rate_limit"},
)
# raises a 429 just like the real API

Dashboard

Visit http://localhost:9999/_fakellm to see which rules are matching and which requests are falling through. Useful for tightening up your config.

What's in v0.1

  • OpenAI /v1/chat/completions and Anthropic /v1/messages
  • Streaming for both
  • Matchers: messages_contain, model_matches, tools_include, header.*
  • Tool call responses
  • Error/status code responses
  • Deterministic fallback
  • Live dashboard

Roadmap

  • Multi-turn response sequences for agentic tests
  • Recorded fixture mode (point at real API, capture, replay)
  • pytest plugin with inline rule definitions
  • More matchers (semantic similarity, JSON schema)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fakellm-0.1.0.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fakellm-0.1.0-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file fakellm-0.1.0.tar.gz.

File metadata

  • Download URL: fakellm-0.1.0.tar.gz
  • Upload date:
  • Size: 8.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for fakellm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9e4b90d459f98cf8302aef84ed30a4556569153f2e71629fcdc8a0724ee04491
MD5 0ead35c321f48c6dc8cc1a1b7dddb66a
BLAKE2b-256 56582051de128da864bc6a66cc43eba539688413a0b308a46520d6a53736e0b8

See more details on using hashes here.

File details

Details for the file fakellm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: fakellm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for fakellm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2d32e759015593397a2d65be815a965c018315947e7205ce2b84fbaa8911fb69
MD5 5bd1c37e1f691fc48d59ea18c6f971ac
BLAKE2b-256 e4eeee6626a98b7e312daf155be93b8856fe1de036e5d576c15aebbc69b5b4f3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page