Zero-cost local mock server for LLM API resilience testing

These details have not been verified by PyPI

Project links

Project description

LLMock

Local mock server for testing LLM retry, fallback, and resilience logic without spending tokens or depending on an external provider.

LLMock gives you a deterministic target for failure handling tests. Run it locally, point your SDK at it, and inject latency, provider-shaped HTTP errors, or varied mock content to validate how your application behaves under real failure modes.

Why LLMock?

Shipping an AI application means dealing with rate limits, timeouts, and upstream 5xx responses. LLMock exists so you can exercise those paths locally and reproducibly before they hit production.

Features

OpenAI-compatible - /v1/chat/completions, /v1/embeddings, /v1/images/generations, /v1/models
10 provider schemas - OpenAI, Anthropic, Mistral, Cohere, Gemini, Groq, Together AI, Perplexity, AI21, xAI
Configurable chaos engineering middleware - latency plus provider-shaped 4xx and 5xx errors with per-status probabilities
Configurable success payloads - static, hello, echo, or varied mock content
Batch API simulation - async JSONL workflow for batch-style tests

Quick Start

Install the package:

pipx install llmock
# fallback
pip install llmock
# or for local development
pip install -e ".[dev]"

pipx is the recommended install path for the CLI because it keeps llmock isolated while still exposing the command globally.

Start the server:

llmock serve

You can bind LLMock to a different local address if needed:

llmock serve --host 0.0.0.0 --port 9001
# or with env vars
LLMOCK_HOST=0.0.0.0 LLMOCK_PORT=9001 llmock serve

You can also load startup settings from a JSON or YAML file:

llmock serve --config llmock.yaml
# or
LLMOCK_CONFIG=llmock.json llmock serve

Precedence is:

CLI flags
environment variables
config file
built-in defaults

Verify it is alive and serving OpenAI-compatible responses:

curl http://127.0.0.1:8000/health

curl http://127.0.0.1:8000/v1/models

curl http://127.0.0.1:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'

Use With The OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:8000/v1",
    api_key="mock-key",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Important: Override The Provider URL

LLMock does not intercept or proxy requests automatically. It only answers on its own local URLs.

That means your app must explicitly replace the provider base URL with the LLMock base URL. If you keep the real provider URL, your requests will still go to the real API.

Depending on the SDK, this setting may be called base_url, baseUrl, endpoint, host, or api_base.

If you start LLMock on a custom local address, replace http://127.0.0.1:8000 below with your own base URL, for example http://192.168.1.50:9001 or http://localhost:8123.

Provider	What to override in your app	LLMock base URL
OpenAI	client base URL	`http://127.0.0.1:8000/v1`
Anthropic	client base URL / endpoint	`http://127.0.0.1:8000/anthropic`
Mistral	client base URL / endpoint	`http://127.0.0.1:8000/mistral/v1`
Cohere	client base URL / endpoint	`http://127.0.0.1:8000/cohere/v2`
Google Gemini	API endpoint / host override	`http://127.0.0.1:8000/gemini/v1beta`
Groq	client base URL	`http://127.0.0.1:8000/groq/openai/v1`
Together AI	client base URL	`http://127.0.0.1:8000/together/v1`
Perplexity	client base URL	`http://127.0.0.1:8000/perplexity/v1`
AI21	client base URL	`http://127.0.0.1:8000/ai21/v1`
xAI (Grok)	client base URL	`http://127.0.0.1:8000/xai/v1`

Typical examples:

# OpenAI-compatible clients
base_url = "http://127.0.0.1:8000/v1"

# Groq with an OpenAI-compatible client
base_url = "http://127.0.0.1:8000/groq/openai/v1"

# Anthropic-style client
base_url = "http://127.0.0.1:8000/anthropic"

Chaos Engineering

Use either environment variables or CLI flags when starting the server. CLI flags override environment variables when both are provided.

LLMOCK_LATENCY_MS=200 \
LLMOCK_ERROR_RATE_400=0.05 \
LLMOCK_ERROR_RATE_401=0.05 \
LLMOCK_ERROR_RATE_404=0.05 \
LLMOCK_ERROR_RATE_429=0.25 \
LLMOCK_ERROR_RATE_500=0.1 \
LLMOCK_ERROR_RATE_503=0.1 \
llmock serve

You can also configure the same thing from the CLI. The main mechanism is the repeatable --error-rate STATUS=PROBABILITY option:

llmock serve \
  --latency-ms 200 \
  --error-rate 400=0.05 \
  --error-rate 401=0.05 \
  --error-rate 404=0.05 \
  --error-rate 429=0.25 \
  --error-rate 500=0.1 \
  --error-rate 503=0.1

Any HTTP error status from 400 to 599 can have its own probability. The only rule is that the total probability mass across all configured errors must stay <= 1.0.

The real generic mechanism is:

env vars: LLMOCK_ERROR_RATE_<STATUS>
CLI: --error-rate <STATUS>=<RATE>
config file: error_rates: {429: 0.25, 503: 0.1} or error_rate_429: 0.25
Python settings: ChaosSettings(error_rate_401=0.1, error_rate_504=0.05) or ChaosSettings(error_rates={401: 0.1, 504: 0.05})

Env var	Flag	Type	Default	Description
`LLMOCK_HOST`	`--host`	string	`127.0.0.1`	Bind address for the local server
`LLMOCK_PORT`	`--port`	int	`8000`	Bind port for the local server
`LLMOCK_LATENCY_MS`	`--latency-ms`	int	`0`	Fixed delay in milliseconds before every non-health response
`LLMOCK_ERROR_RATE_<STATUS>`	`--error-rate STATUS=RATE`	float 0-1	`0.0`	Probability of returning any `4xx` or `5xx` status between `400` and `599`

Optional shortcut flags remain for convenience and backwards compatibility:

--error-rate-429 is equivalent to --error-rate 429=RATE
--error-rate-500 is equivalent to --error-rate 500=RATE
--error-rate-503 is equivalent to --error-rate 503=RATE

The /health endpoint is always exempt from chaos injection so monitoring stays reliable.

LLMock can inject any HTTP error from 400 to 599. Common API-facing examples include:

client-side failures: 400, 401, 402, 403, 404, 408, 409, 413, 422, 429
upstream/service failures: 500, 501, 502, 503, 504, 529

Error payloads are provider-aware for both the common named statuses above and any other injected 4xx or 5xx. Anthropic-style endpoints return Anthropic-like envelopes, Gemini-style endpoints return Google-style error.status payloads, and OpenAI-compatible routes return {"error": ...} objects.

Config File Format

Both flat keys and grouped sections are supported. This JSON example and the YAML example below are equivalent:

{
  "server": {
    "host": "0.0.0.0",
    "port": 9001
  },
  "chaos": {
    "latency_ms": 200,
    "error_rates": {
      "400": 0.05,
      "401": 0.05,
      "429": 0.25,
      "500": 0.1,
      "503": 0.1
    }
  },
  "responses": {
    "style": "echo"
  }
}

server:
  host: 0.0.0.0
  port: 9001

chaos:
  latency_ms: 200
  error_rates:
    400: 0.05
    401: 0.05
    429: 0.25
    500: 0.10
    503: 0.10

responses:
  style: echo

Success Payload Styles

You can also configure how successful mock responses read:

llmock serve --response-style hello

Available styles:

static: always returns a plain deterministic mock sentence
hello: always returns a friendly greeting-style reply
echo: echoes part of the incoming prompt
varied: picks a deterministic but more natural-looking variation from the request content

Quick Chaos Demo

llmock serve --latency-ms 200 --error-rate 429=0.5

for i in {1..6}; do
  curl -s -o /dev/null -w "%{http_code}\n" \
    http://127.0.0.1:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "ping"}]}'
done

Use this to validate retry logic, exponential backoff, and fallback paths before they hit a real provider.

Provider Endpoints

Provider	Base path	Key endpoint
OpenAI	`/v1`	`/v1/chat/completions`
Anthropic	`/anthropic`	`/anthropic/v1/messages`
Mistral	`/mistral/v1`	`/mistral/v1/chat/completions`
Cohere	`/cohere/v2`	`/cohere/v2/chat`
Google Gemini	`/gemini/v1beta`	`/gemini/v1beta/models/{model}:generateContent`
Groq	`/groq/openai/v1`	`/groq/openai/v1/chat/completions`
Together AI	`/together/v1`	`/together/v1/chat/completions`
Perplexity	`/perplexity/v1`	`/perplexity/v1/chat/completions`
AI21	`/ai21/v1`	`/ai21/v1/chat/completions`
xAI (Grok)	`/xai/v1`	`/xai/v1/chat/completions`

All providers pass through the same chaos middleware.

These paths are the ones your client must target after you override the provider URL.

More Examples

See examples/README.md for runnable demos, including an OpenAI SDK retry loop and scripted chaos scenarios.

Releasing

LLMock is intended to ship on PyPI, with llmock serve as the primary entry point.

Recommended install path: pipx install llmock
Fallback install path: pip install llmock
Release trigger: Git tags like v0.1.0
Maintainer checklist: RELEASING.md

The release workflow builds the package, runs checks, generates GitHub release notes, and publishes to PyPI through trusted publishing.

Community

Contribution guide: CONTRIBUTING.md
Code of conduct: CODE_OF_CONDUCT.md
Security policy: SECURITY.md
Release process: RELEASING.md

Testing

pytest

The test suite covers OpenAI-compatible endpoints, provider variants, batch simulation, and chaos injection.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Mar 21, 2026

0.1.0

Mar 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmock-0.1.1.tar.gz (53.7 kB view details)

Uploaded Mar 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmock-0.1.1-py3-none-any.whl (50.1 kB view details)

Uploaded Mar 21, 2026 Python 3

File details

Details for the file llmock-0.1.1.tar.gz.

File metadata

Download URL: llmock-0.1.1.tar.gz
Upload date: Mar 21, 2026
Size: 53.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for llmock-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`c961352c03e0256d1c7e65009150936699427233dbdff01ee4594989b93451f0`
MD5	`fb83a0aa941a1157198db2f3c6218fdf`
BLAKE2b-256	`50ad758af80b4f0c1b08eb569634df09d007f24cf2755a49961fc0f35d14c29a`

See more details on using hashes here.

File details

Details for the file llmock-0.1.1-py3-none-any.whl.

File metadata

Download URL: llmock-0.1.1-py3-none-any.whl
Upload date: Mar 21, 2026
Size: 50.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for llmock-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2a9ad39c052a80c938f50f409ab37b16be60ff4b458b746c90cc8e2742a0c249`
MD5	`33467e3a6d5d411ac58fca0ddfbeab3e`
BLAKE2b-256	`d9e0af19f6086fec2c4f79fa4b8a3618a7594eb39988bdaa668348c1b5736542`

See more details on using hashes here.

llmock 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LLMock

Why LLMock?

Features

Quick Start

Use With The OpenAI SDK

Important: Override The Provider URL

Chaos Engineering

Config File Format

Success Payload Styles

Quick Chaos Demo

Provider Endpoints

More Examples

Releasing

Community

Testing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes