A pytest plugin for testing LLM outputs using semantic similarity matching

These details have not been verified by PyPI

Project links

Project description

pytest-semantic

A pytest plugin for testing LLM outputs using semantic similarity matching instead of exact string comparison.

Installation

pip install pytest-semantic

Or with uv:

uv pip install pytest-semantic

Quick Start

def test_llm_greeting(semantic_matcher):
    """Test that LLM generates appropriate greetings."""
    llm_response = my_llm("Say hello")

    matcher = semantic_matcher(
        valid=["Hello!", "Hi there!", "Greetings!"],
    )

    assert llm_response == matcher

Features

Semantic Matching: Compare text responses based on meaning, not exact strings
Flexible Configuration: Configure thresholds globally or per-test
Custom Embeddings: Use your own embedding models or functions
Offline-First: Works locally with included sentence-transformers model
Clear Error Messages: Detailed failure messages with similarity scores
Easy Integration: Simple pytest fixture-based API

Usage

Basic Usage

Use the semantic_matcher fixture to create matchers with valid examples:

def test_llm_response(semantic_matcher):
    response = generate_llm_response("What is the capital of France?")

    matcher = semantic_matcher(
        valid=["Paris", "The capital is Paris", "Paris is the capital"],
    )

    assert response == matcher

With Invalid Examples

Provide invalid examples to strengthen the matching:

def test_sentiment_classification(semantic_matcher):
    result = classify_sentiment("I love this product!")

    matcher = semantic_matcher(
        valid=["positive", "good", "happy"],
        invalid=["negative", "bad", "sad", "neutral"],
    )

    assert result == matcher

If you don't provide invalid examples, random word combinations are automatically generated as a baseline.

Custom Thresholds

Adjust matching sensitivity per test:

def test_with_custom_threshold(semantic_matcher):
    matcher = semantic_matcher(
        valid=["Python programming"],
        threshold=0.2,        # Difference between valid/invalid similarity
        min_similarity=0.6,   # Minimum absolute similarity to valid examples
    )

    assert "Python coding" == matcher

Reusable Matchers

Create reusable matchers with fixtures:

import pytest

@pytest.fixture
def greeting_matcher(semantic_matcher):
    return semantic_matcher(
        valid=["Hello!", "Hi there!", "Hey!"],
    )

@pytest.fixture
def farewell_matcher(semantic_matcher):
    return semantic_matcher(
        valid=["Goodbye!", "See you!", "Bye!"],
    )

def test_conversation(greeting_matcher, farewell_matcher):
    assert llm.greet() == greeting_matcher
    assert llm.say_goodbye() == farewell_matcher

Custom Embedding Functions

Use your own embedding function:

def test_with_custom_embeddings(semantic_matcher):
    def my_embed_function(text: str) -> list:
        # Your custom embedding logic
        return openai.embeddings.create(input=text, model="text-embedding-3-small")

    matcher = semantic_matcher(
        valid=["Hello"],
        custom_embed_fn=my_embed_function,
    )

    assert "Hi" == matcher

Configuration

Configure default values in pytest.ini:

[pytest]
semantic_threshold = 0.15
semantic_min_similarity = 0.5
semantic_model = all-MiniLM-L6-v2

Or in pyproject.toml:

[tool.pytest.ini_options]
semantic_threshold = 0.15
semantic_min_similarity = 0.5
semantic_model = "all-MiniLM-L6-v2"

Configuration Options

semantic_threshold (default: 0.15): Minimum difference between similarity to valid examples vs invalid examples
semantic_min_similarity (default: 0.5): Minimum absolute similarity score to valid examples (0-1 range)
semantic_model (default: "all-MiniLM-L6-v2"): Sentence-transformers model name

How It Works

Embeddings: Text is converted to vector embeddings using sentence-transformers
Similarity Calculation: Cosine similarity is computed between response and examples
Dual Criteria:
- Response must be at least min_similarity similar to valid examples
- Response must be at least threshold more similar to valid vs invalid examples

This dual-criteria approach prevents false positives while ensuring meaningful matches.

Error Messages

When a test fails, you get detailed information:

AssertionError: Semantic similarity check failed
  Response: "Bonjour"
  Similarity to valid examples: 0.342
  Similarity to invalid examples: 0.156
  Difference: 0.186 (threshold: 0.150)
  Failure reason: Response similarity (0.342) is below minimum threshold (0.500)
  Closest valid example: 'Hello!' (similarity: 0.389)
  Valid examples:
    - 'Hello!'
    - 'Hi there!'
    - 'Greetings!'

API Reference

`semantic_matcher(valid, invalid=None, threshold=None, min_similarity=None, model_name=None, custom_embed_fn=None)`

Creates a semantic matcher for comparing text responses.

Parameters:

valid (List[str]): List of valid example responses (required)
invalid (List[str], optional): List of invalid examples (random words if not provided)
threshold (float, optional): Override default threshold (0-1 range)
min_similarity (float, optional): Override default minimum similarity (0-1 range)
model_name (str, optional): Override default sentence-transformers model
custom_embed_fn (Callable, optional): Custom embedding function (str) -> List[float]

Returns: SemanticMatcher instance that can be used with == operator

`SemanticMatcher.check(response: str) -> bool`

Explicitly check if a response matches. Raises SemanticAssertionError on failure.

Examples

Testing LLM Text Generation

def test_story_generation(semantic_matcher):
    """Test that LLM generates creative stories."""
    story = llm.generate_story(prompt="A robot learning to paint")

    matcher = semantic_matcher(
        valid=[
            "A robot discovers art and creativity",
            "An AI learns to express itself through painting",
            "A mechanical being explores artistic expression",
        ],
        threshold=0.1,  # Allow more variation for creative content
    )

    assert story == matcher

Testing Classification

def test_intent_classification(semantic_matcher):
    """Test intent classification accuracy."""
    intent = classify_intent("I want to cancel my subscription")

    matcher = semantic_matcher(
        valid=["cancel", "cancellation", "unsubscribe"],
        invalid=["help", "question", "purchase", "upgrade"],
    )

    assert intent == matcher

Testing Summarization

def test_summarization(semantic_matcher):
    """Test that summaries capture key points."""
    long_text = "..." # Long article
    summary = llm.summarize(long_text)

    matcher = semantic_matcher(
        valid=[
            "Article discusses climate change impacts",
            "The text is about environmental challenges",
        ],
        min_similarity=0.4,  # Lower threshold for summaries
    )

    assert summary == matcher

Development

Setup

git clone https://github.com/tombedor/pytest-semantic.git
cd pytest-semantic
uv sync

Running Tests

uv run pytest tests/

Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

License

MIT License - see LICENSE file for details.

Credits

Built with:

sentence-transformers for embeddings
pytest testing framework

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Nov 11, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytest_semantic-0.1.0.tar.gz (12.6 kB view details)

Uploaded Nov 11, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pytest_semantic-0.1.0-py3-none-any.whl (12.8 kB view details)

Uploaded Nov 11, 2025 Python 3

File details

Details for the file pytest_semantic-0.1.0.tar.gz.

File metadata

Download URL: pytest_semantic-0.1.0.tar.gz
Upload date: Nov 11, 2025
Size: 12.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for pytest_semantic-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a69084e2168363fd4c297655f25f31a06e3abb027ba43db8ba64765536e06f61`
MD5	`65b45db626bb7ce9d26a1abfa72f172d`
BLAKE2b-256	`8504ce430bc0ce8aedb0f581540b6a61da7d62f6a7202674ee608d0c2f7db185`

See more details on using hashes here.

File details

Details for the file pytest_semantic-0.1.0-py3-none-any.whl.

File metadata

Download URL: pytest_semantic-0.1.0-py3-none-any.whl
Upload date: Nov 11, 2025
Size: 12.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for pytest_semantic-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d16758f39a99d4c99b44ee669d71096e824211fbc167e7913e1f7fa3461b74b2`
MD5	`c1ff3773d63f8a6c0af958366700b38d`
BLAKE2b-256	`ba964d74a5df5cfe7a83fb3409025f5cc63fd21ab7fd8ad57d6fc7b89ee59926`

See more details on using hashes here.

pytest-semantic 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pytest-semantic

Installation

Quick Start

Features

Usage

Basic Usage

With Invalid Examples

Custom Thresholds

Reusable Matchers

Custom Embedding Functions

Configuration

Configuration Options

How It Works

Error Messages

API Reference

semantic_matcher(valid, invalid=None, threshold=None, min_similarity=None, model_name=None, custom_embed_fn=None)

SemanticMatcher.check(response: str) -> bool

Examples

Testing LLM Text Generation

Testing Classification

Testing Summarization

Development

Setup

Running Tests

Contributing

License

Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`semantic_matcher(valid, invalid=None, threshold=None, min_similarity=None, model_name=None, custom_embed_fn=None)`

`SemanticMatcher.check(response: str) -> bool`