Skip to main content

Zero-cost local mock server for LLM API resilience testing

Project description

Python 3.11+ CI License: MIT GitHub stars

LLMock

Local mock server for testing LLM retry, fallback, and resilience logic without spending tokens or depending on an external provider.

LLMock gives you a deterministic target for failure handling tests. Run it locally, point your SDK at it, and inject latency, provider-shaped HTTP errors, or varied mock content to validate how your application behaves under real failure modes.


Why LLMock?

Shipping an AI application means dealing with rate limits, timeouts, and upstream 5xx responses. LLMock exists so you can exercise those paths locally and reproducibly before they hit production.


Features

  • OpenAI-compatible - /v1/chat/completions, /v1/embeddings, /v1/images/generations, /v1/models
  • 10 provider schemas - OpenAI, Anthropic, Mistral, Cohere, Gemini, Groq, Together AI, Perplexity, AI21, xAI
  • Configurable chaos engineering middleware - latency plus provider-shaped 4xx and 5xx errors with per-status probabilities
  • Configurable success payloads - static, hello, echo, or varied mock content
  • Batch API simulation - async JSONL workflow for batch-style tests

Quick Start

  1. Install the package:
pipx install llmock
# fallback
pip install llmock
# or for local development
pip install -e ".[dev]"

pipx is the recommended install path for the CLI because it keeps llmock isolated while still exposing the command globally.

  1. Start the server:
llmock serve

You can bind LLMock to a different local address if needed:

llmock serve --host 0.0.0.0 --port 9001
# or with env vars
LLMOCK_HOST=0.0.0.0 LLMOCK_PORT=9001 llmock serve

You can also load startup settings from a JSON or YAML file:

llmock serve --config llmock.yaml
# or
LLMOCK_CONFIG=llmock.json llmock serve

Precedence is:

  • CLI flags
  • environment variables
  • config file
  • built-in defaults
  1. Verify it is alive and serving OpenAI-compatible responses:
curl http://127.0.0.1:8000/health

curl http://127.0.0.1:8000/v1/models

curl http://127.0.0.1:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'

Use With The OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:8000/v1",
    api_key="mock-key",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Important: Override The Provider URL

LLMock does not intercept or proxy requests automatically. It only answers on its own local URLs.

That means your app must explicitly replace the provider base URL with the LLMock base URL. If you keep the real provider URL, your requests will still go to the real API.

Depending on the SDK, this setting may be called base_url, baseUrl, endpoint, host, or api_base.

If you start LLMock on a custom local address, replace http://127.0.0.1:8000 below with your own base URL, for example http://192.168.1.50:9001 or http://localhost:8123.

Provider What to override in your app LLMock base URL
OpenAI client base URL http://127.0.0.1:8000/v1
Anthropic client base URL / endpoint http://127.0.0.1:8000/anthropic
Mistral client base URL / endpoint http://127.0.0.1:8000/mistral/v1
Cohere client base URL / endpoint http://127.0.0.1:8000/cohere/v2
Google Gemini API endpoint / host override http://127.0.0.1:8000/gemini/v1beta
Groq client base URL http://127.0.0.1:8000/groq/openai/v1
Together AI client base URL http://127.0.0.1:8000/together/v1
Perplexity client base URL http://127.0.0.1:8000/perplexity/v1
AI21 client base URL http://127.0.0.1:8000/ai21/v1
xAI (Grok) client base URL http://127.0.0.1:8000/xai/v1

Typical examples:

# OpenAI-compatible clients
base_url = "http://127.0.0.1:8000/v1"

# Groq with an OpenAI-compatible client
base_url = "http://127.0.0.1:8000/groq/openai/v1"

# Anthropic-style client
base_url = "http://127.0.0.1:8000/anthropic"

Chaos Engineering

Use either environment variables or CLI flags when starting the server. CLI flags override environment variables when both are provided.

LLMOCK_LATENCY_MS=200 \
LLMOCK_ERROR_RATE_400=0.05 \
LLMOCK_ERROR_RATE_401=0.05 \
LLMOCK_ERROR_RATE_404=0.05 \
LLMOCK_ERROR_RATE_429=0.25 \
LLMOCK_ERROR_RATE_500=0.1 \
LLMOCK_ERROR_RATE_503=0.1 \
llmock serve

You can also configure the same thing from the CLI. The main mechanism is the repeatable --error-rate STATUS=PROBABILITY option:

llmock serve \
  --latency-ms 200 \
  --error-rate 400=0.05 \
  --error-rate 401=0.05 \
  --error-rate 404=0.05 \
  --error-rate 429=0.25 \
  --error-rate 500=0.1 \
  --error-rate 503=0.1

Any HTTP error status from 400 to 599 can have its own probability. The only rule is that the total probability mass across all configured errors must stay <= 1.0.

The real generic mechanism is:

  • env vars: LLMOCK_ERROR_RATE_<STATUS>
  • CLI: --error-rate <STATUS>=<RATE>
  • config file: error_rates: {429: 0.25, 503: 0.1} or error_rate_429: 0.25
  • Python settings: ChaosSettings(error_rate_401=0.1, error_rate_504=0.05) or ChaosSettings(error_rates={401: 0.1, 504: 0.05})
Env var Flag Type Default Description
LLMOCK_HOST --host string 127.0.0.1 Bind address for the local server
LLMOCK_PORT --port int 8000 Bind port for the local server
LLMOCK_LATENCY_MS --latency-ms int 0 Fixed delay in milliseconds before every non-health response
LLMOCK_ERROR_RATE_<STATUS> --error-rate STATUS=RATE float 0-1 0.0 Probability of returning any 4xx or 5xx status between 400 and 599

Optional shortcut flags remain for convenience and backwards compatibility:

  • --error-rate-429 is equivalent to --error-rate 429=RATE
  • --error-rate-500 is equivalent to --error-rate 500=RATE
  • --error-rate-503 is equivalent to --error-rate 503=RATE

The /health endpoint is always exempt from chaos injection so monitoring stays reliable.

LLMock can inject any HTTP error from 400 to 599. Common API-facing examples include:

  • client-side failures: 400, 401, 402, 403, 404, 408, 409, 413, 422, 429
  • upstream/service failures: 500, 501, 502, 503, 504, 529

Error payloads are provider-aware for both the common named statuses above and any other injected 4xx or 5xx. Anthropic-style endpoints return Anthropic-like envelopes, Gemini-style endpoints return Google-style error.status payloads, and OpenAI-compatible routes return {"error": ...} objects.

Config File Format

Both flat keys and grouped sections are supported. This JSON example and the YAML example below are equivalent:

{
  "server": {
    "host": "0.0.0.0",
    "port": 9001
  },
  "chaos": {
    "latency_ms": 200,
    "error_rates": {
      "400": 0.05,
      "401": 0.05,
      "429": 0.25,
      "500": 0.1,
      "503": 0.1
    }
  },
  "responses": {
    "style": "echo"
  }
}
server:
  host: 0.0.0.0
  port: 9001

chaos:
  latency_ms: 200
  error_rates:
    400: 0.05
    401: 0.05
    429: 0.25
    500: 0.10
    503: 0.10

responses:
  style: echo

Success Payload Styles

You can also configure how successful mock responses read:

llmock serve --response-style hello

Available styles:

  • static: always returns a plain deterministic mock sentence
  • hello: always returns a friendly greeting-style reply
  • echo: echoes part of the incoming prompt
  • varied: picks a deterministic but more natural-looking variation from the request content

Quick Chaos Demo

llmock serve --latency-ms 200 --error-rate 429=0.5
for i in {1..6}; do
  curl -s -o /dev/null -w "%{http_code}\n" \
    http://127.0.0.1:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "ping"}]}'
done

Use this to validate retry logic, exponential backoff, and fallback paths before they hit a real provider.


Provider Endpoints

Provider Base path Key endpoint
OpenAI /v1 /v1/chat/completions
Anthropic /anthropic /anthropic/v1/messages
Mistral /mistral/v1 /mistral/v1/chat/completions
Cohere /cohere/v2 /cohere/v2/chat
Google Gemini /gemini/v1beta /gemini/v1beta/models/{model}:generateContent
Groq /groq/openai/v1 /groq/openai/v1/chat/completions
Together AI /together/v1 /together/v1/chat/completions
Perplexity /perplexity/v1 /perplexity/v1/chat/completions
AI21 /ai21/v1 /ai21/v1/chat/completions
xAI (Grok) /xai/v1 /xai/v1/chat/completions

All providers pass through the same chaos middleware.

These paths are the ones your client must target after you override the provider URL.


More Examples

See examples/README.md for runnable demos, including an OpenAI SDK retry loop and scripted chaos scenarios.


Releasing

LLMock is intended to ship on PyPI, with llmock serve as the primary entry point.

  • Recommended install path: pipx install llmock
  • Fallback install path: pip install llmock
  • Release trigger: Git tags like v0.1.0
  • Maintainer checklist: RELEASING.md

The release workflow builds the package, runs checks, generates GitHub release notes, and publishes to PyPI through trusted publishing.


Community


Testing

pytest

The test suite covers OpenAI-compatible endpoints, provider variants, batch simulation, and chaos injection.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmock-0.1.1.tar.gz (53.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmock-0.1.1-py3-none-any.whl (50.1 kB view details)

Uploaded Python 3

File details

Details for the file llmock-0.1.1.tar.gz.

File metadata

  • Download URL: llmock-0.1.1.tar.gz
  • Upload date:
  • Size: 53.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for llmock-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c961352c03e0256d1c7e65009150936699427233dbdff01ee4594989b93451f0
MD5 fb83a0aa941a1157198db2f3c6218fdf
BLAKE2b-256 50ad758af80b4f0c1b08eb569634df09d007f24cf2755a49961fc0f35d14c29a

See more details on using hashes here.

File details

Details for the file llmock-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: llmock-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 50.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for llmock-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2a9ad39c052a80c938f50f409ab37b16be60ff4b458b746c90cc8e2742a0c249
MD5 33467e3a6d5d411ac58fca0ddfbeab3e
BLAKE2b-256 d9e0af19f6086fec2c4f79fa4b8a3618a7594eb39988bdaa668348c1b5736542

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page