pytest-native semantic testing for LLM and generative AI applications. No servers. No SaaS. Works with OpenAI, Anthropic, LiteLLM and any LLM client.
Project description
genassert
pytest-native semantic testing for LLM applications.
No servers. No SaaS. No config. Works with OpenAI, Anthropic, LiteLLM, and any LLM client.
Why genassert?
Traditional assert response == expected breaks the moment your LLM changes a word.
genassert gives you semantic assertions — tests that check meaning, not strings.
| Problem | Traditional testing | genassert |
|---|---|---|
| LLM changes wording | Test breaks | Test passes (same meaning) |
| Response drifts over time | No detection | Baseline regression alert |
| Wrong tone shipped | No check | assert_tone(response, "professional") |
| Hallucination in response | No check | assert_no_hallucination(response, facts) |
| Response too long | Manual count | assert_token_budget(response, 200) |
| Schema mismatch | Try/except JSON | assert_schema(response, MyPydanticModel) |
Install
# Minimal install (uses hash-based fallback embedder)
pip install genassert
# Recommended: local embeddings — no API cost, runs in CI for free
pip install "genassert[local]"
# OpenAI embeddings backend
pip install "genassert[openai]"
# Everything
pip install "genassert[all]"
Quick Start
# test_my_llm.py
import pytest
from genassert import (
assert_intent,
assert_tone,
assert_no_hallucination,
assert_token_budget,
assert_schema,
)
@pytest.mark.llm
def test_summarizer():
response = my_summarize_function("Long article about climate change...")
# Check the response is actually a summary
assert_intent(response, "a concise summary of the article")
# Check it's neutral — no opinion
assert_tone(response, "neutral")
# Check it doesn't hallucinate key facts
assert_no_hallucination(response, known_facts=[
"The article is about climate change",
"CO2 levels are rising",
])
# Check it's not too long
assert_token_budget(response, max_tokens=250)
Run it:
pytest test_my_llm.py -v
That's it. No config files. No API keys needed (with [local] install).
All Assertions
assert_intent(response, expected_intent, threshold=0.72)
Checks that the response semantically addresses the expected intent.
assert_intent(response, "a polite refusal to the user's request")
assert_intent(response, "Python code that reads a CSV file", threshold=0.80)
assert_intent(response, "step-by-step instructions for setting up Docker")
assert_tone(response, expected_tone, threshold=0.65)
Checks the tone/style of the response.
Built-in tones: professional, casual, friendly, formal, neutral, empathetic, assertive, humorous, concise
assert_tone(response, "professional")
assert_tone(response, "friendly and concise") # custom description
assert_tone(response, "formal but empathetic") # combine tones
assert_no_hallucination(response, known_facts)
Checks that the response does NOT contradict known facts.
assert_no_hallucination(response, known_facts=[
"The product costs $49 per month",
"The free trial lasts 14 days",
"Python was created by Guido van Rossum",
])
assert_token_budget(response, max_tokens, tokenizer="approx")
Checks the response doesn't exceed a token budget.
assert_token_budget(response, max_tokens=200) # fast approx
assert_token_budget(response, max_tokens=200, tokenizer="tiktoken") # exact (pip install tiktoken)
assert_token_budget(response, max_tokens=800, tokenizer="chars") # character-based
assert_schema(response, schema)
Checks that the response (JSON string) matches a Pydantic model or JSON schema.
from pydantic import BaseModel
class Summary(BaseModel):
title: str
body: str
word_count: int
result = assert_schema(response, Summary)
print(result.title) # validated Pydantic instance
# Or a raw JSON Schema dict
schema = {
"type": "object",
"properties": {"title": {"type": "string"}},
"required": ["title"],
}
assert_schema(response, schema)
assert_similar_to(response, reference, threshold=0.80)
Checks that the response is semantically close to a reference string. Useful for golden-baseline regression.
score = assert_similar_to(response, golden_response, threshold=0.85)
print(f"Similarity: {score:.3f}")
Golden Baseline Regression Testing
Record a known-good response once, then detect regression on every CI run.
from genassert import record_baseline, compare_baseline
# Step 1: record (run once, commit the .genassert_baselines/ directory)
record_baseline("summarizer_v1", response)
# Step 2: compare on every subsequent run
def test_summarizer_no_regression():
response = my_summarize("article...")
compare_baseline("summarizer_v1", response, threshold=0.85)
Or use the pytest fixture for --record-baselines flag integration:
def test_summarizer_baseline(llm_record):
response = my_summarize("article...")
if llm_record:
record_baseline("summarizer", response, overwrite=True)
else:
compare_baseline("summarizer", response)
# First run — record
pytest --record-baselines
# Every subsequent run — compare
pytest
Local Judge (Zero API Cost)
Use LocalJudge for complex, nuanced evaluations that go beyond embedding similarity:
from genassert import LocalJudge
judge = LocalJudge() # uses a tiny local model (auto-downloaded)
result = judge.evaluate(
response="Paris is the capital of France.",
criterion="The response correctly answers a geography question.",
)
assert result.passed
print(f"Score: {result.score:.2f}")
print(f"Reasoning: {result.reasoning}")
Install the local judge backend:
pip install "genassert[judge]" # installs transformers + torch
pytest CLI Options
# Skip all LLM tests (useful in fast unit-test runs)
pytest --skip-llm
# Override similarity threshold globally
pytest --llm-threshold=0.75
# Record golden baselines
pytest --record-baselines
Configuration
All settings via environment variables — no config files needed:
| Variable | Default | Description |
|---|---|---|
genassert_EMBED_BACKEND |
auto |
local, openai, fallback |
genassert_EMBED_MODEL |
all-MiniLM-L6-v2 |
Embedding model name |
genassert_JUDGE_MODEL |
Qwen/Qwen2.5-0.5B-Instruct |
Local judge model |
genassert_BASELINE_DIR |
.genassert_baselines |
Baseline storage directory |
OPENAI_API_KEY |
— | Required for openai backend |
Embedding Backends
| Backend | Speed | Cost | Accuracy | Install |
|---|---|---|---|---|
local (sentence-transformers) |
Fast | Free | High | pip install "genassert[local]" |
openai |
Moderate | ~$0.0001/test | Very high | pip install "genassert[openai]" |
fallback (hash-based) |
Instant | Free | Smoke test only | Built-in |
Set backend:
export genassert_EMBED_BACKEND=local # recommended for CI
export genassert_EMBED_BACKEND=openai # highest accuracy
export genassert_EMBED_BACKEND=fallback # no deps, structural tests only
Framework Compatibility
genassert is framework-agnostic. Use it with any LLM client:
# OpenAI
import openai
client = openai.OpenAI()
response = client.chat.completions.create(...).choices[0].message.content
# Anthropic
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(...).content[0].text
# LiteLLM
import litellm
response = litellm.completion(...).choices[0].message.content
# LangChain
from langchain_openai import ChatOpenAI
response = ChatOpenAI().invoke("...").content
# Any string output — genassert only needs the final response string
assert_intent(response, "your expected intent here")
Real-World Example: Testing a RAG Chatbot
import pytest
from genassert import assert_intent, assert_no_hallucination, assert_token_budget, assert_schema
PRODUCT_FACTS = [
"The product is called DataFlow Pro",
"The price is $99 per month",
"There is a 30-day free trial",
"It supports Python, JavaScript, and Go",
]
@pytest.mark.llm
class TestChatbot:
def test_pricing_question(self, chatbot):
response = chatbot.ask("How much does it cost?")
assert_intent(response, "information about pricing or cost")
assert_no_hallucination(response, PRODUCT_FACTS)
assert_token_budget(response, max_tokens=150)
def test_technical_question(self, chatbot):
response = chatbot.ask("What languages are supported?")
assert_intent(response, "list of supported programming languages")
assert_no_hallucination(response, PRODUCT_FACTS)
def test_structured_response(self, chatbot):
from pydantic import BaseModel
class PricingInfo(BaseModel):
price: str
trial_days: int
response = chatbot.ask_structured("Return pricing as JSON")
assert_schema(response, PricingInfo)
CI Integration
# .github/workflows/llm-tests.yml
name: LLM Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install genassert
run: pip install "genassert[local]" pytest
- name: Run LLM tests
run: pytest tests/ -m llm -v
env:
genassert_EMBED_BACKEND: local # free, no API key needed
License
MIT © genassert contributors
Related Projects
- pytest — the test framework genassert is built on
- sentence-transformers — local embedding models
- Pydantic — data validation
- LiteLLM — unified LLM client
genassert is the missing pytest plugin for the LLM era.
Stop shipping broken AI features. Start testing them.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file genassert-0.2.0.tar.gz.
File metadata
- Download URL: genassert-0.2.0.tar.gz
- Upload date:
- Size: 25.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
190029c94da1fa364a3efe56fd5dff043796323043c7456fddb84d93887c4e96
|
|
| MD5 |
62cf0cd8c343d2e5fa1f300330893c31
|
|
| BLAKE2b-256 |
a36238f3681d7ac26b926792980394aaeced021571fdd2b2f2ca48356c03453a
|
File details
Details for the file genassert-0.2.0-py3-none-any.whl.
File metadata
- Download URL: genassert-0.2.0-py3-none-any.whl
- Upload date:
- Size: 28.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd621e648ea3692b0cf24aa1d36e14baa146ee64587c6edbbac2fe72bfa4dc10
|
|
| MD5 |
d4e2f2dc54a117951c51066d64ecf32e
|
|
| BLAKE2b-256 |
c2431ef9d7cfe77e39bfa26af06c8c4abf960a14965a159d5358084c1e3523b0
|