Skip to main content

pytest-native semantic testing for LLM and generative AI applications. No servers. No SaaS. Works with OpenAI, Anthropic, LiteLLM and any LLM client.

Project description

genassert

pytest-native semantic testing for LLM applications.
No servers. No SaaS. No config. Works with OpenAI, Anthropic, LiteLLM, and any LLM client.

PyPI version Python 3.9+ License: MIT pytest


Why genassert?

Traditional assert response == expected breaks the moment your LLM changes a word.
genassert gives you semantic assertions — tests that check meaning, not strings.

Problem Traditional testing genassert
LLM changes wording Test breaks Test passes (same meaning)
Response drifts over time No detection Baseline regression alert
Wrong tone shipped No check assert_tone(response, "professional")
Hallucination in response No check assert_no_hallucination(response, facts)
Response too long Manual count assert_token_budget(response, 200)
Schema mismatch Try/except JSON assert_schema(response, MyPydanticModel)

Install

# Minimal install (uses hash-based fallback embedder)
pip install genassert

# Recommended: local embeddings — no API cost, runs in CI for free
pip install "genassert[local]"

# OpenAI embeddings backend
pip install "genassert[openai]"

# Everything
pip install "genassert[all]"

Quick Start

# test_my_llm.py
import pytest
from genassert import (
    assert_intent,
    assert_tone,
    assert_no_hallucination,
    assert_token_budget,
    assert_schema,
)

@pytest.mark.llm
def test_summarizer():
    response = my_summarize_function("Long article about climate change...")

    # Check the response is actually a summary
    assert_intent(response, "a concise summary of the article")

    # Check it's neutral — no opinion
    assert_tone(response, "neutral")

    # Check it doesn't hallucinate key facts
    assert_no_hallucination(response, known_facts=[
        "The article is about climate change",
        "CO2 levels are rising",
    ])

    # Check it's not too long
    assert_token_budget(response, max_tokens=250)

Run it:

pytest test_my_llm.py -v

That's it. No config files. No API keys needed (with [local] install).


All Assertions

assert_intent(response, expected_intent, threshold=0.72)

Checks that the response semantically addresses the expected intent.

assert_intent(response, "a polite refusal to the user's request")
assert_intent(response, "Python code that reads a CSV file", threshold=0.80)
assert_intent(response, "step-by-step instructions for setting up Docker")

assert_tone(response, expected_tone, threshold=0.65)

Checks the tone/style of the response.

Built-in tones: professional, casual, friendly, formal, neutral, empathetic, assertive, humorous, concise

assert_tone(response, "professional")
assert_tone(response, "friendly and concise")       # custom description
assert_tone(response, "formal but empathetic")      # combine tones

assert_no_hallucination(response, known_facts)

Checks that the response does NOT contradict known facts.

assert_no_hallucination(response, known_facts=[
    "The product costs $49 per month",
    "The free trial lasts 14 days",
    "Python was created by Guido van Rossum",
])

assert_token_budget(response, max_tokens, tokenizer="approx")

Checks the response doesn't exceed a token budget.

assert_token_budget(response, max_tokens=200)                       # fast approx
assert_token_budget(response, max_tokens=200, tokenizer="tiktoken") # exact (pip install tiktoken)
assert_token_budget(response, max_tokens=800, tokenizer="chars")    # character-based

assert_schema(response, schema)

Checks that the response (JSON string) matches a Pydantic model or JSON schema.

from pydantic import BaseModel

class Summary(BaseModel):
    title: str
    body: str
    word_count: int

result = assert_schema(response, Summary)
print(result.title)   # validated Pydantic instance
# Or a raw JSON Schema dict
schema = {
    "type": "object",
    "properties": {"title": {"type": "string"}},
    "required": ["title"],
}
assert_schema(response, schema)

assert_similar_to(response, reference, threshold=0.80)

Checks that the response is semantically close to a reference string. Useful for golden-baseline regression.

score = assert_similar_to(response, golden_response, threshold=0.85)
print(f"Similarity: {score:.3f}")

Golden Baseline Regression Testing

Record a known-good response once, then detect regression on every CI run.

from genassert import record_baseline, compare_baseline

# Step 1: record (run once, commit the .genassert_baselines/ directory)
record_baseline("summarizer_v1", response)

# Step 2: compare on every subsequent run
def test_summarizer_no_regression():
    response = my_summarize("article...")
    compare_baseline("summarizer_v1", response, threshold=0.85)

Or use the pytest fixture for --record-baselines flag integration:

def test_summarizer_baseline(llm_record):
    response = my_summarize("article...")
    if llm_record:
        record_baseline("summarizer", response, overwrite=True)
    else:
        compare_baseline("summarizer", response)
# First run — record
pytest --record-baselines

# Every subsequent run — compare
pytest

Local Judge (Zero API Cost)

Use LocalJudge for complex, nuanced evaluations that go beyond embedding similarity:

from genassert import LocalJudge

judge = LocalJudge()   # uses a tiny local model (auto-downloaded)

result = judge.evaluate(
    response="Paris is the capital of France.",
    criterion="The response correctly answers a geography question.",
)

assert result.passed
print(f"Score: {result.score:.2f}")
print(f"Reasoning: {result.reasoning}")

Install the local judge backend:

pip install "genassert[judge]"   # installs transformers + torch

pytest CLI Options

# Skip all LLM tests (useful in fast unit-test runs)
pytest --skip-llm

# Override similarity threshold globally
pytest --llm-threshold=0.75

# Record golden baselines
pytest --record-baselines

Configuration

All settings via environment variables — no config files needed:

Variable Default Description
genassert_EMBED_BACKEND auto local, openai, fallback
genassert_EMBED_MODEL all-MiniLM-L6-v2 Embedding model name
genassert_JUDGE_MODEL Qwen/Qwen2.5-0.5B-Instruct Local judge model
genassert_BASELINE_DIR .genassert_baselines Baseline storage directory
OPENAI_API_KEY Required for openai backend

Embedding Backends

Backend Speed Cost Accuracy Install
local (sentence-transformers) Fast Free High pip install "genassert[local]"
openai Moderate ~$0.0001/test Very high pip install "genassert[openai]"
fallback (hash-based) Instant Free Smoke test only Built-in

Set backend:

export genassert_EMBED_BACKEND=local    # recommended for CI
export genassert_EMBED_BACKEND=openai   # highest accuracy
export genassert_EMBED_BACKEND=fallback # no deps, structural tests only

Framework Compatibility

genassert is framework-agnostic. Use it with any LLM client:

# OpenAI
import openai
client = openai.OpenAI()
response = client.chat.completions.create(...).choices[0].message.content

# Anthropic
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(...).content[0].text

# LiteLLM
import litellm
response = litellm.completion(...).choices[0].message.content

# LangChain
from langchain_openai import ChatOpenAI
response = ChatOpenAI().invoke("...").content

# Any string output — genassert only needs the final response string
assert_intent(response, "your expected intent here")

Real-World Example: Testing a RAG Chatbot

import pytest
from genassert import assert_intent, assert_no_hallucination, assert_token_budget, assert_schema

PRODUCT_FACTS = [
    "The product is called DataFlow Pro",
    "The price is $99 per month",
    "There is a 30-day free trial",
    "It supports Python, JavaScript, and Go",
]

@pytest.mark.llm
class TestChatbot:
    def test_pricing_question(self, chatbot):
        response = chatbot.ask("How much does it cost?")
        assert_intent(response, "information about pricing or cost")
        assert_no_hallucination(response, PRODUCT_FACTS)
        assert_token_budget(response, max_tokens=150)

    def test_technical_question(self, chatbot):
        response = chatbot.ask("What languages are supported?")
        assert_intent(response, "list of supported programming languages")
        assert_no_hallucination(response, PRODUCT_FACTS)

    def test_structured_response(self, chatbot):
        from pydantic import BaseModel
        class PricingInfo(BaseModel):
            price: str
            trial_days: int

        response = chatbot.ask_structured("Return pricing as JSON")
        assert_schema(response, PricingInfo)

CI Integration

# .github/workflows/llm-tests.yml
name: LLM Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install genassert
        run: pip install "genassert[local]" pytest

      - name: Run LLM tests
        run: pytest tests/ -m llm -v
        env:
          genassert_EMBED_BACKEND: local   # free, no API key needed

License

MIT © genassert contributors


Related Projects


genassert is the missing pytest plugin for the LLM era.
Stop shipping broken AI features. Start testing them.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genassert-0.2.1.tar.gz (25.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

genassert-0.2.1-py3-none-any.whl (28.0 kB view details)

Uploaded Python 3

File details

Details for the file genassert-0.2.1.tar.gz.

File metadata

  • Download URL: genassert-0.2.1.tar.gz
  • Upload date:
  • Size: 25.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for genassert-0.2.1.tar.gz
Algorithm Hash digest
SHA256 34d6c4d42dab6b861708e2a90cb84d3344c4071d91960a061b24071b5e5268fa
MD5 8047e77b74412e0db6273e685b162a95
BLAKE2b-256 7172d415927401e2172a3059a1ea05ed802ccae54668436da12db0e432030f0f

See more details on using hashes here.

File details

Details for the file genassert-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: genassert-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 28.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for genassert-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 78c2459943c5179a9ca47d9c6664fa78ced52c32ab9c9a5f6e6565736d284e54
MD5 43517d932a7f290b6b8a6c471506fbf0
BLAKE2b-256 98bbd676e37126868238edfa5d7f3d174197a34fb6ed4c8fd62ee8d87cfdc36e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page