Skip to main content

๐Ÿš€ Zero-config pytest plugin for mocking LLM APIs - OpenAI, Anthropic, Gemini, LangChain & more

Project description

๐Ÿงช pytest-mockllm

๐Ÿš€ Zero-config LLM mocking for pytest โ€” Test AI apps without the AI bills

PyPI version Python versions CI License Downloads

Quick Start โ€ข Features โ€ข Providers โ€ข Examples โ€ข Recording โ€ข Configuration


Why pytest-mockllm?

Testing LLM applications is painful:

  • ๐Ÿ’ธ Expensive โ€” Every test run burns API credits
  • ๐Ÿข Slow โ€” API calls add seconds to your test suite
  • ๐ŸŽฒ Non-deterministic โ€” Same input, different output = flaky tests
  • ๐Ÿ”’ Requires API keys โ€” CI needs secrets, local dev needs setup

pytest-mockllm fixes all of this with zero configuration:

# Just use the fixture โ€” it works immediately!
def test_my_chatbot(mock_openai):
    mock_openai.add_response("Hello! I'm here to help.")
    
    response = my_chatbot.chat("Hi there!")
    
    assert "help" in response.lower()
    assert mock_openai.call_count == 1

No setup. No API keys. No costs. Just fast, reliable tests.


๐Ÿš€ Quick Start

Installation

pip install pytest-mockllm

That's it! The plugin is auto-discovered by pytest.

Your First Test

def test_customer_support_bot(mock_openai):
    # Configure the mock response
    mock_openai.add_response("I can help you with your order. What's your order number?")
    
    # Your actual code that uses OpenAI
    from openai import OpenAI
    client = OpenAI(api_key="fake-key")  # Key doesn't matter!
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "I need help with my order"}]
    )
    
    # Assert on the response
    assert "order number" in response.choices[0].message.content.lower()
    
    # Assert on what was called
    assert mock_openai.call_count == 1
    assert mock_openai.last_call["model"] == "gpt-4o"

โœจ Features

๐ŸŽฏ Zero Configuration

Fixtures are auto-discovered. Just use them.

๐Ÿค– Multi-Provider Support

OpenAI, Anthropic, Google Gemini โ€” one consistent API.

๐ŸŒŠ Streaming Support

Full support for streaming responses, just like the real APIs.

๐Ÿ”ง LangChain & LlamaIndex

Native integration with popular LLM frameworks.

๐Ÿ“ผ Response Recording

VCR-style recording for golden tests.

โšก Chaos Engineering

Simulate rate limits, timeouts, and random latency jitter to test your app's resilience.

๐Ÿ’ฐ Cost & Token Tracking

Professional-grade token counting with tiktoken and built-in cost dashboard.

๐Ÿ“ผ Secure Recording

VCR-style recording with automatic PII redaction (API keys, Bearer tokens).

๐Ÿ”’ Type Safe

Full type hints and objects that match SDK structures perfectly.


๐Ÿ†• What's New in v0.2.1

This major release transforms pytest-mockllm into a professional-grade tool:

  • ๐Ÿš€ True Async Support: Real coroutines and async iterators for all providers.
  • ๐ŸŽฏ Accurate Tokenizers: High-fidelity counting with tiktoken.
  • ๐Ÿ“Š Cost Saved Dashboard: Live ROI tracking in your terminal.
  • โšก Chaos Engineering: Proactive resilience testing with jitter and error simulation.
  • ๐Ÿ Next-Gen Support: Verified compatibility with Python 3.14.

๐Ÿค– Providers

OpenAI

def test_openai(mock_openai):
    mock_openai.add_response("The answer is 42")
    
    # Works with the official OpenAI SDK
    from openai import OpenAI
    client = OpenAI(api_key="fake")
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "What is the meaning of life?"}]
    )
    
    assert response.choices[0].message.content == "The answer is 42"

Anthropic

def test_anthropic(mock_anthropic):
    mock_anthropic.add_response("I'd be happy to help!")
    
    from anthropic import Anthropic
    client = Anthropic(api_key="fake")
    
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello Claude!"}]
    )
    
    assert "happy" in response.content[0].text

Google Gemini

def test_gemini(mock_gemini):
    mock_gemini.add_response("Here's what I found...")
    
    import google.generativeai as genai
    model = genai.GenerativeModel("gemini-1.5-pro")
    
    response = model.generate_content("Tell me about AI")
    
    assert "found" in response.text

LangChain

def test_langchain(mock_langchain):
    mock_langchain.add_response("Paris is the capital of France.")
    
    from langchain_openai import ChatOpenAI
    from langchain_core.prompts import ChatPromptTemplate
    
    llm = ChatOpenAI(model="gpt-4o", api_key="fake")
    prompt = ChatPromptTemplate.from_template("What is the capital of {country}?")
    chain = prompt | llm
    
    result = chain.invoke({"country": "France"})
    
    assert "Paris" in result.content

๐Ÿ“š Examples

Multiple Responses (Conversation)

def test_conversation(mock_openai):
    mock_openai.add_responses(
        "Hi! How can I help you today?",
        "I can definitely help with that order.",
        "Your order has been updated. Anything else?",
    )
    
    # First call
    response1 = chatbot.send("Hello")
    assert "help" in response1
    
    # Second call
    response2 = chatbot.send("I need to change my order")  
    assert "order" in response2
    
    # Third call
    response3 = chatbot.send("Change quantity to 5")
    assert "updated" in response3

Streaming Responses

def test_streaming(mock_openai):
    mock_openai.add_response("This is a streaming response that comes in chunks")
    
    from openai import OpenAI
    client = OpenAI(api_key="fake")
    
    stream = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Tell me a story"}],
        stream=True,
    )
    
    full_response = ""
    for chunk in stream:
        if chunk.choices[0].delta.content:
            full_response += chunk.choices[0].delta.content
    
    assert "streaming" in full_response

Function/Tool Calling

def test_function_calling(mock_openai):
    from pytest_mockllm.core import MockResponse
    
    mock_openai._responses.append(MockResponse(
        content="",
        tool_calls=[{
            "id": "call_123",
            "function": {
                "name": "get_weather",
                "arguments": {"location": "San Francisco", "unit": "celsius"},
            },
        }],
    ))
    
    # Your function-calling logic here
    # ...
    
    assert mock_openai.last_call is not None

Token Usage & Cost Assertions

def test_stays_within_budget(mock_openai):
    from pytest_mockllm.core import TokenUsage, estimate_cost
    
    mock_openai.add_response(
        "A detailed response...",
        token_usage=TokenUsage(prompt_tokens=500, completion_tokens=1000),
    )
    
    # Your LLM call here
    result = my_function()
    
    # Assert token usage
    assert mock_openai.total_tokens < 2000
    assert mock_openai.total_completion_tokens < 1500
    
    # Assert cost (for gpt-4o)
    cost = estimate_cost(
        "gpt-4o",
        mock_openai.total_prompt_tokens,
        mock_openai.total_completion_tokens,
    )
    assert cost < 0.05  # Less than 5 cents

Error Simulation (Chaos Testing)

def test_handles_rate_limit(mock_openai):
    mock_openai.simulate_error("rate_limit", after_calls=2)
    mock_openai.add_responses("OK", "OK")
    
    # First two calls succeed, third fails
    # ...

def test_handles_jitter(mock_openai):
    # Add up to 500ms random latency to every call
    mock_openai.simulate_jitter(max_ms=500)
    # ...

def test_random_failures(mock_openai):
    # 10% chance of random "server" or "rate_limit" error
    mock_openai.simulate_random_errors(probability=0.1)
    # ...

Strict Mode

def test_catches_unconfigured_calls(mock_openai):
    mock_openai.set_strict_mode(True)
    
    # This will raise an error because no response is configured
    with pytest.raises(RuntimeError, match="No mock response configured"):
        my_function_that_calls_llm()

๐Ÿ“ผ Recording & Replay

Record real API responses once, replay them forever โ€” like VCR for LLMs.

Record Mode

# First run: hits real API and saves response
@pytest.mark.llm_record
def test_with_recording(llm_recorder):
    from openai import OpenAI
    client = OpenAI()  # Uses real API key from environment
    
    response = client.chat.completions.create(
        model="gpt-4o-mini",  # Use cheap model for recording
        messages=[{"role": "user", "content": "Say hello!"}]
    )
    
    assert response.choices[0].message.content

Run with recording:

pytest tests/test_example.py --llm-record

Replay Mode

# Subsequent runs: uses saved response (no API key needed!)
@pytest.mark.llm_replay
def test_with_recording(llm_recorder):
    # Same test code โ€” but now uses cached response
    # ...

Cassette Storage

Responses are saved in tests/llm_cassettes/ as YAML:

name: test_with_recording
version: "1.0"
created: 1703270400
interactions:
  - request:
      model: gpt-4o-mini
      messages:
        - role: user
          content: Say hello!
    response:
      content: "Hello! How can I assist you today?"
      model: gpt-4o-mini
    provider: openai
    latency_ms: 523

๐Ÿ”ง Configuration

CLI Options

# Enable recording mode
pytest --llm-record

# Custom cassette directory
pytest --llm-cassette-dir=my_cassettes

# Strict mode (fail if any LLM call is unconfigured)
pytest --llm-strict

Markers

@pytest.mark.llm_mock(provider="anthropic")
def test_with_anthropic(mock_llm):
    # mock_llm is now an AnthropicMock
    pass

@pytest.mark.llm_record
def test_records_responses(llm_recorder):
    pass

@pytest.mark.llm_replay  
def test_replays_responses(llm_recorder):
    pass

pytest.ini / pyproject.toml

[tool.pytest.ini_options]
# Default cassette directory
llm_cassette_dir = "tests/fixtures/llm"

# Always run in strict mode
llm_strict = true

๐Ÿ†š Comparison

Feature pytest-mockllm unittest.mock responses vcrpy
Zero config โœ… โŒ โŒ โŒ
pytest fixtures โœ… โŒ โœ… โœ…
Async support โœ… True Async ๐ŸŸก Complex โŒ ๐ŸŸก HTTP
OpenAI/Anthropic โœ… Native ๐ŸŸก Manual โŒ ๐ŸŸก HTTP
Gemini support โœ… Native ๐ŸŸก Manual โŒ ๐ŸŸก HTTP
Token counting โœ… tiktoken โŒ โŒ โŒ
Cost Dashboard โœ… โŒ โŒ โŒ
Recording/Replay โœ… Redacted โŒ โŒ โœ…
Chaos Engineering โœ… Jitter/Error ๐ŸŸก Manual ๐ŸŸก HTTP โŒ

๐Ÿ›ฃ๏ธ Roadmap

  • True Async/await support
  • Professional Tokenizers (tiktoken)
  • Terminal Cost Dashboard
  • Chaos Engineering (Jitter)
  • Secure Recording (PII Redaction)
  • More providers (Cohere, Mistral)
  • pytest-xdist compatibility
  • Integration with LangSmith

๐Ÿค Contributing

We'd love your help! See CONTRIBUTING.md for guidelines.

# Clone the repo
git clone https://github.com/godhiraj-code/pytest-mockllm.git

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run linting
ruff check src/
mypy src/

๐Ÿ“œ License

MIT License โ€” see LICENSE for details.


Stop paying for tests. Start shipping faster.

โญ Star us on GitHub โ€ข Built by Dhiraj Das

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytest_mockllm-0.2.1.tar.gz (30.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pytest_mockllm-0.2.1-py3-none-any.whl (32.0 kB view details)

Uploaded Python 3

File details

Details for the file pytest_mockllm-0.2.1.tar.gz.

File metadata

  • Download URL: pytest_mockllm-0.2.1.tar.gz
  • Upload date:
  • Size: 30.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for pytest_mockllm-0.2.1.tar.gz
Algorithm Hash digest
SHA256 0f9bad56b24d2767aa2a3667ae4a19a865baf2b8d05dcaee88f47fb3ab420c9f
MD5 70a89733cda9f62e29f18f2997103c47
BLAKE2b-256 a8b8f8a422f74cf84f15b0ecbfe9d479d8ae4f4928a94641214206ea13b02e29

See more details on using hashes here.

File details

Details for the file pytest_mockllm-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: pytest_mockllm-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 32.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for pytest_mockllm-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c7509c827c932069feebbaa0c43ec31bef7312cd2b9a7a003c879aa4cde2ad7c
MD5 c48171f980403db8b3350a49fdcfba66
BLAKE2b-256 387fa6fe0651ad1618352e31f5c2e3e17945dae94df9c01ee83d7548c056e8f6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page