pytest-native semantic testing for LLM and generative AI applications. No servers. No SaaS. Works with OpenAI, Anthropic, LiteLLM and any LLM client.

These details have not been verified by PyPI

Project links

Project description

genassert

pytest-native semantic testing for LLM applications.
No servers. No SaaS. No config. Works with OpenAI, Anthropic, LiteLLM, and any LLM client.

Why genassert?

Traditional assert response == expected breaks the moment your LLM changes a word.
genassert gives you semantic assertions — tests that check meaning, not strings.

Problem	Traditional testing	genassert
LLM changes wording	Test breaks	Test passes (same meaning)
Response drifts over time	No detection	Baseline regression alert
Wrong tone shipped	No check	`assert_tone(response, "professional")`
Hallucination in response	No check	`assert_no_hallucination(response, facts)`
Response too long	Manual count	`assert_token_budget(response, 200)`
Schema mismatch	Try/except JSON	`assert_schema(response, MyPydanticModel)`

Install

# Minimal install (uses hash-based fallback embedder)
pip install genassert

# Recommended: local embeddings — no API cost, runs in CI for free
pip install "genassert[local]"

# OpenAI embeddings backend
pip install "genassert[openai]"

# Everything
pip install "genassert[all]"

Quick Start

# test_my_llm.py
import pytest
from genassert import (
    assert_intent,
    assert_tone,
    assert_no_hallucination,
    assert_token_budget,
    assert_schema,
)

@pytest.mark.llm
def test_summarizer():
    response = my_summarize_function("Long article about climate change...")

    # Check the response is actually a summary
    assert_intent(response, "a concise summary of the article")

    # Check it's neutral — no opinion
    assert_tone(response, "neutral")

    # Check it doesn't hallucinate key facts
    assert_no_hallucination(response, known_facts=[
        "The article is about climate change",
        "CO2 levels are rising",
    ])

    # Check it's not too long
    assert_token_budget(response, max_tokens=250)

Run it:

pytest test_my_llm.py -v

That's it. No config files. No API keys needed (with [local] install).

All Assertions

`assert_intent(response, expected_intent, threshold=0.72)`

Checks that the response semantically addresses the expected intent.

assert_intent(response, "a polite refusal to the user's request")
assert_intent(response, "Python code that reads a CSV file", threshold=0.80)
assert_intent(response, "step-by-step instructions for setting up Docker")

`assert_tone(response, expected_tone, threshold=0.65)`

Checks the tone/style of the response.

Built-in tones: professional, casual, friendly, formal, neutral, empathetic, assertive, humorous, concise

assert_tone(response, "professional")
assert_tone(response, "friendly and concise")       # custom description
assert_tone(response, "formal but empathetic")      # combine tones

`assert_no_hallucination(response, known_facts)`

Checks that the response does NOT contradict known facts.

assert_no_hallucination(response, known_facts=[
    "The product costs $49 per month",
    "The free trial lasts 14 days",
    "Python was created by Guido van Rossum",
])

`assert_token_budget(response, max_tokens, tokenizer="approx")`

Checks the response doesn't exceed a token budget.

assert_token_budget(response, max_tokens=200)                       # fast approx
assert_token_budget(response, max_tokens=200, tokenizer="tiktoken") # exact (pip install tiktoken)
assert_token_budget(response, max_tokens=800, tokenizer="chars")    # character-based

`assert_schema(response, schema)`

Checks that the response (JSON string) matches a Pydantic model or JSON schema.

from pydantic import BaseModel

class Summary(BaseModel):
    title: str
    body: str
    word_count: int

result = assert_schema(response, Summary)
print(result.title)   # validated Pydantic instance

# Or a raw JSON Schema dict
schema = {
    "type": "object",
    "properties": {"title": {"type": "string"}},
    "required": ["title"],
}
assert_schema(response, schema)

`assert_similar_to(response, reference, threshold=0.80)`

Checks that the response is semantically close to a reference string. Useful for golden-baseline regression.

score = assert_similar_to(response, golden_response, threshold=0.85)
print(f"Similarity: {score:.3f}")

Golden Baseline Regression Testing

Record a known-good response once, then detect regression on every CI run.

from genassert import record_baseline, compare_baseline

# Step 1: record (run once, commit the .genassert_baselines/ directory)
record_baseline("summarizer_v1", response)

# Step 2: compare on every subsequent run
def test_summarizer_no_regression():
    response = my_summarize("article...")
    compare_baseline("summarizer_v1", response, threshold=0.85)

Or use the pytest fixture for --record-baselines flag integration:

def test_summarizer_baseline(llm_record):
    response = my_summarize("article...")
    if llm_record:
        record_baseline("summarizer", response, overwrite=True)
    else:
        compare_baseline("summarizer", response)

# First run — record
pytest --record-baselines

# Every subsequent run — compare
pytest

Local Judge (Zero API Cost)

Use LocalJudge for complex, nuanced evaluations that go beyond embedding similarity:

from genassert import LocalJudge

judge = LocalJudge()   # uses a tiny local model (auto-downloaded)

result = judge.evaluate(
    response="Paris is the capital of France.",
    criterion="The response correctly answers a geography question.",
)

assert result.passed
print(f"Score: {result.score:.2f}")
print(f"Reasoning: {result.reasoning}")

Install the local judge backend:

pip install "genassert[judge]"   # installs transformers + torch

pytest CLI Options

# Skip all LLM tests (useful in fast unit-test runs)
pytest --skip-llm

# Override similarity threshold globally
pytest --llm-threshold=0.75

# Record golden baselines
pytest --record-baselines

Configuration

All settings via environment variables — no config files needed:

Variable	Default	Description
`genassert_EMBED_BACKEND`	`auto`	`local`, `openai`, `fallback`
`genassert_EMBED_MODEL`	`all-MiniLM-L6-v2`	Embedding model name
`genassert_JUDGE_MODEL`	`Qwen/Qwen2.5-0.5B-Instruct`	Local judge model
`genassert_BASELINE_DIR`	`.genassert_baselines`	Baseline storage directory
`OPENAI_API_KEY`	—	Required for `openai` backend

Embedding Backends

Backend	Speed	Cost	Accuracy	Install
`local` (sentence-transformers)	Fast	Free	High	`pip install "genassert[local]"`
`openai`	Moderate	~$0.0001/test	Very high	`pip install "genassert[openai]"`
`fallback` (hash-based)	Instant	Free	Smoke test only	Built-in

Set backend:

export genassert_EMBED_BACKEND=local    # recommended for CI
export genassert_EMBED_BACKEND=openai   # highest accuracy
export genassert_EMBED_BACKEND=fallback # no deps, structural tests only

Framework Compatibility

genassert is framework-agnostic. Use it with any LLM client:

# OpenAI
import openai
client = openai.OpenAI()
response = client.chat.completions.create(...).choices[0].message.content

# Anthropic
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(...).content[0].text

# LiteLLM
import litellm
response = litellm.completion(...).choices[0].message.content

# LangChain
from langchain_openai import ChatOpenAI
response = ChatOpenAI().invoke("...").content

# Any string output — genassert only needs the final response string
assert_intent(response, "your expected intent here")

Real-World Example: Testing a RAG Chatbot

import pytest
from genassert import assert_intent, assert_no_hallucination, assert_token_budget, assert_schema

PRODUCT_FACTS = [
    "The product is called DataFlow Pro",
    "The price is $99 per month",
    "There is a 30-day free trial",
    "It supports Python, JavaScript, and Go",
]

@pytest.mark.llm
class TestChatbot:
    def test_pricing_question(self, chatbot):
        response = chatbot.ask("How much does it cost?")
        assert_intent(response, "information about pricing or cost")
        assert_no_hallucination(response, PRODUCT_FACTS)
        assert_token_budget(response, max_tokens=150)

    def test_technical_question(self, chatbot):
        response = chatbot.ask("What languages are supported?")
        assert_intent(response, "list of supported programming languages")
        assert_no_hallucination(response, PRODUCT_FACTS)

    def test_structured_response(self, chatbot):
        from pydantic import BaseModel
        class PricingInfo(BaseModel):
            price: str
            trial_days: int

        response = chatbot.ask_structured("Return pricing as JSON")
        assert_schema(response, PricingInfo)

CI Integration

# .github/workflows/llm-tests.yml
name: LLM Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install genassert
        run: pip install "genassert[local]" pytest

      - name: Run LLM tests
        run: pytest tests/ -m llm -v
        env:
          genassert_EMBED_BACKEND: local   # free, no API key needed

License

MIT © genassert contributors

Related Projects

pytest — the test framework genassert is built on
sentence-transformers — local embedding models
Pydantic — data validation
LiteLLM — unified LLM client

genassert is the missing pytest plugin for the LLM era.
Stop shipping broken AI features. Start testing them.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.2

Apr 6, 2026

0.2.1

Apr 6, 2026

This version

0.2.0

Apr 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genassert-0.2.0.tar.gz (25.4 kB view details)

Uploaded Apr 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

genassert-0.2.0-py3-none-any.whl (28.0 kB view details)

Uploaded Apr 6, 2026 Python 3

File details

Details for the file genassert-0.2.0.tar.gz.

File metadata

Download URL: genassert-0.2.0.tar.gz
Upload date: Apr 6, 2026
Size: 25.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for genassert-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`190029c94da1fa364a3efe56fd5dff043796323043c7456fddb84d93887c4e96`
MD5	`62cf0cd8c343d2e5fa1f300330893c31`
BLAKE2b-256	`a36238f3681d7ac26b926792980394aaeced021571fdd2b2f2ca48356c03453a`

See more details on using hashes here.

File details

Details for the file genassert-0.2.0-py3-none-any.whl.

File metadata

Download URL: genassert-0.2.0-py3-none-any.whl
Upload date: Apr 6, 2026
Size: 28.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for genassert-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bd621e648ea3692b0cf24aa1d36e14baa146ee64587c6edbbac2fe72bfa4dc10`
MD5	`d4e2f2dc54a117951c51066d64ecf32e`
BLAKE2b-256	`c2431ef9d7cfe77e39bfa26af06c8c4abf960a14965a159d5358084c1e3523b0`

See more details on using hashes here.

genassert 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

genassert

Why genassert?

Install

Quick Start

All Assertions

assert_intent(response, expected_intent, threshold=0.72)

assert_tone(response, expected_tone, threshold=0.65)

assert_no_hallucination(response, known_facts)

assert_token_budget(response, max_tokens, tokenizer="approx")

assert_schema(response, schema)

assert_similar_to(response, reference, threshold=0.80)

Golden Baseline Regression Testing

Local Judge (Zero API Cost)

pytest CLI Options

Configuration

Embedding Backends

Framework Compatibility

Real-World Example: Testing a RAG Chatbot

CI Integration

License

Related Projects

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`assert_intent(response, expected_intent, threshold=0.72)`

`assert_tone(response, expected_tone, threshold=0.65)`

`assert_no_hallucination(response, known_facts)`

`assert_token_budget(response, max_tokens, tokenizer="approx")`

`assert_schema(response, schema)`

`assert_similar_to(response, reference, threshold=0.80)`