Skip to main content

pytest-llm: A pytest plugin for testing LLM outputs with success rate thresholds.

Project description

pytest-llm

Fast AI reliability test suite

A pytest plugin that provides a custom marker for testing LLM (Large Language Model) outputs with configurable success rate thresholds.

Usage

import pytest

@pytest.mark.llm("How many R's are in the Word 'Strawberry'?", 0.9)
def test_counting(prompt, llm):
    result = llm(prompt).lower()
    assert ("3" in result) or ("three" in result)

Setup

python3 -m pip install -e pytest-llm
# conftest.py
import os
import typing

import httpx
import ollama
import pytest


def github_models_complete(prompt, model=None, system=None) -> str:
    messages = [{"role": "system", "content": system}] if system else []
    messages += [{"role": "user", "content": prompt}]
    if GITHUB_TOKEN := os.getenv("GITHUB_TOKEN"):
        response = httpx.post(
            "https://models.github.ai/inference/chat/completions",
            headers={
                "Content-Type": "application/json",
                "Accept": "application/vnd.github+json",
                "Authorization": f"Bearer {GITHUB_TOKEN}",
                "X-GitHub-Api-Version": "2022-11-28",
            },
            json={
                "model": model or "openai/gpt-5-nano",
                "messages": messages,
            },
        )
        response.raise_for_status()
        return response.json()["choices"][0]["message"]["content"]
    else:
        return ollama.generate(model="llama3.2", prompt=prompt).response


@pytest.fixture
def llm() -> typing.Callable[[str], str]:
    # This fixture isn't needed but might be convenient
    return github_models_complete


def pytest_llm_complete(config):
    # This pytest hook will provide enable the plugin to generate random prompts
    return github_models_complete

Running Tests

pytest -m llm
# or to run non-llm tests
pytest -m "not llm"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytest_llm-0.1.0.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pytest_llm-0.1.0-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file pytest_llm-0.1.0.tar.gz.

File metadata

  • Download URL: pytest_llm-0.1.0.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pytest_llm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 25a8afd9047c0076763dfbeb82d29e9219b1f34109d3aef12a80c9a0859839b7
MD5 d5352b2969f6c1a5d7ba0bf377f0d88f
BLAKE2b-256 53fa1eb091225592e20145911005b68d5e129efaec1d643be179238f51fdd390

See more details on using hashes here.

Provenance

The following attestation bundles were made for pytest_llm-0.1.0.tar.gz:

Publisher: release.yml on codingjoe/pytest-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pytest_llm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pytest_llm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pytest_llm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3abf50c4bb0c7e2fbbb6a886ad0955f06004a8b845b4300dc8d8ab64ece04857
MD5 2f2d4435a7ffbe4c071ed9134bda6ad6
BLAKE2b-256 3f2ae2573f7e81ec8d79480265db7c9008e583117049d28cd05bf7f88ea27c89

See more details on using hashes here.

Provenance

The following attestation bundles were made for pytest_llm-0.1.0-py3-none-any.whl:

Publisher: release.yml on codingjoe/pytest-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page