Skip to main content

A mock server that mimics OpenAI and Anthropic API formats for testing

Project description

Mock LLM Server

CI PyPI version License: MIT

A FastAPI-based mock LLM server that mimics OpenAI and Anthropic API formats. Instead of calling actual language models, it uses predefined responses from a YAML configuration file.

This is made for when you want a deterministic response for testing or development purposes.

Check out the CodeGate when you're done here!

Project Structure

mockllm/
├── src/
│   └── mockllm/
│       ├── __init__.py
│       ├── config.py      # Response configuration handling
│       ├── models.py      # Pydantic models for API
│       └── server.py      # FastAPI server implementation
├── tests/
│   └── test_server.py     # Test suite
├── example.responses.yml   # Example response configuration
├── LICENSE                # MIT License
├── MANIFEST.in           # Package manifest
├── README.md             # This file
├── pyproject.toml        # Project configuration
└── requirements.txt      # Dependencies

Features

  • OpenAI and Anthropic compatible API endpoints
  • Streaming support (character-by-character response streaming)
  • Configurable responses via YAML file
  • Hot-reloading of response configurations
  • JSON logging
  • Error handling
  • Mock token counting

Installation

From PyPI

pip install mockllm

From Source

  1. Clone the repository:
git clone https://github.com/lukehinds/mockllm.git
cd mockllm
  1. Create a virtual environment and activate it:
python -m venv venv
source venv/bin/activate  # On Windows, use: venv\Scripts\activate
  1. Install dependencies:
pip install -e ".[dev]"  # Install with development dependencies
# or
pip install -e .         # Install without development dependencies

Usage

  1. Set up the responses.yml
cp example.responses.yml responses.yml
  1. Start the server:
python -m mockllm

Or using uvicorn directly:

uvicorn mockllm.server:app --reload

The server will start on http://localhost:8000

  1. Send requests to the API endpoints:

OpenAI Format

Regular request:

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mock-llm",
    "messages": [
      {"role": "user", "content": "what colour is the sky?"}
    ]
  }'

Streaming request:

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mock-llm",
    "messages": [
      {"role": "user", "content": "what colour is the sky?"}
    ],
    "stream": true
  }'

Anthropic Format

Regular request:

curl -X POST http://localhost:8000/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3-sonnet-20240229",
    "messages": [
      {"role": "user", "content": "what colour is the sky?"}
    ]
  }'

Streaming request:

curl -X POST http://localhost:8000/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3-sonnet-20240229",
    "messages": [
      {"role": "user", "content": "what colour is the sky?"}
    ],
    "stream": true
  }'

Configuration

Response Configuration

Responses are configured in responses.yml. The file has two main sections:

  1. responses: Maps input prompts to predefined responses
  2. defaults: Contains default configurations like the unknown response message

Example responses.yml:

responses:
  "what colour is the sky?": "The sky is blue during a clear day due to a phenomenon called Rayleigh scattering."
  "what is 2+2?": "2+2 equals 9."

defaults:
  unknown_response: "I don't know the answer to that. This is a mock response."

Hot Reloading

The server automatically detects changes to responses.yml and reloads the configuration without requiring a restart.

API Format

OpenAI Format

Request Format

{
  "model": "mock-llm",
  "messages": [
    {"role": "user", "content": "what colour is the sky?"}
  ],
  "temperature": 0.7,
  "max_tokens": 150,
  "stream": false
}

Response Format

Regular response:

{
  "id": "mock-123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "mock-llm",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "The sky is blue during a clear day due to a phenomenon called Rayleigh scattering."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 5,
    "total_tokens": 15
  }
}

Streaming response (Server-Sent Events format):

data: {"id":"mock-123","object":"chat.completion.chunk","created":1700000000,"model":"mock-llm","choices":[{"delta":{"role":"assistant"},"index":0}]}

data: {"id":"mock-124","object":"chat.completion.chunk","created":1700000000,"model":"mock-llm","choices":[{"delta":{"content":"T"},"index":0}]}

data: {"id":"mock-125","object":"chat.completion.chunk","created":1700000000,"model":"mock-llm","choices":[{"delta":{"content":"h"},"index":0}]}

... (character by character)

data: {"id":"mock-999","object":"chat.completion.chunk","created":1700000000,"model":"mock-llm","choices":[{"delta":{},"index":0,"finish_reason":"stop"}]}

data: [DONE]

Anthropic Format

Request Format

{
  "model": "claude-3-sonnet-20240229",
  "messages": [
    {"role": "user", "content": "what colour is the sky?"}
  ],
  "max_tokens": 1024,
  "stream": false
}

Response Format

Regular response:

{
  "id": "mock-123",
  "type": "message",
  "role": "assistant",
  "model": "claude-3-sonnet-20240229",
  "content": [
    {
      "type": "text",
      "text": "The sky is blue during a clear day due to a phenomenon called Rayleigh scattering."
    }
  ],
  "usage": {
    "input_tokens": 10,
    "output_tokens": 5,
    "total_tokens": 15
  }
}

Streaming response (Server-Sent Events format):

data: {"type":"message_delta","id":"mock-123","delta":{"type":"content_block_delta","index":0,"delta":{"text":"T"}}}

data: {"type":"message_delta","id":"mock-123","delta":{"type":"content_block_delta","index":0,"delta":{"text":"h"}}}

... (character by character)

data: [DONE]

Development

Running Tests

pip install -e ".[dev]"  # Install development dependencies
pytest tests/

Code Quality

# Format code
black .
isort .

# Type checking
mypy src/

# Linting
ruff check .

Error Handling

The server includes comprehensive error handling:

  • Invalid requests return 400 status codes with descriptive messages
  • Server errors return 500 status codes with error details
  • All errors are logged using JSON format

Logging

The server uses JSON-formatted logging for:

  • Incoming request details
  • Response configuration loading
  • Error messages and stack traces

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mockllm-0.0.2.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mockllm-0.0.2-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file mockllm-0.0.2.tar.gz.

File metadata

  • Download URL: mockllm-0.0.2.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for mockllm-0.0.2.tar.gz
Algorithm Hash digest
SHA256 3a3d9447234bfad58adc052a2aef71cc966eed339b973756d8cc4a8d7ee4799c
MD5 1e62d4134308ce5223af083c2ff41ff9
BLAKE2b-256 17fcd6ff1406ebc6b93c98f4d7265fab036afb7e810a646d4eef4bda4fe70b5b

See more details on using hashes here.

Provenance

The following attestation bundles were made for mockllm-0.0.2.tar.gz:

Publisher: publish.yml on stacklok/mockllm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mockllm-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: mockllm-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 12.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for mockllm-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 cb4a7c5f6f0dc39040cbcd4cbfcf9a336eea4805eb9106c7f11bcafa47db5974
MD5 b68c3f8422b7ffebd47cd26fb7aacb63
BLAKE2b-256 98c42fb3b4b393c479154c19ee72e04b1b4ec8e1a8504c4adc6ad2545119cb70

See more details on using hashes here.

Provenance

The following attestation bundles were made for mockllm-0.0.2-py3-none-any.whl:

Publisher: publish.yml on stacklok/mockllm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page