Skip to main content

pytest-native semantic assertions for LLM and generative AI applications. No servers. No SaaS. Works with OpenAI, Anthropic, LiteLLM and any LLM client.

Project description

semanticheck

pytest-native semantic assertions for LLM and generative AI applications.

No servers. No SaaS. No config. Works with OpenAI, Anthropic, LiteLLM, or any LLM client.

PyPI Python License: MIT CI


Installation

pip install semanticheck

With local embeddings (recommended, no API cost):

pip install "semanticheck[local]"

With OpenAI embeddings:

pip install "semanticheck[openai]"

Quick Start

from semanticheck import assert_intent, assert_tone, assert_no_hallucination, assert_no_pii

def test_customer_support_reply():
    response = my_llm("Help me reset my password")

    assert_intent(response, "instructions for resetting a password")
    assert_tone(response, "friendly")
    assert_no_pii(response)
    assert_no_hallucination(response, known_facts=["Password reset links expire after 24 hours"])

Assertions

Function What it checks
assert_intent(response, expected_intent) Semantic match to expected meaning
assert_tone(response, tone) Tone: professional, casual, friendly, formal, etc.
assert_no_hallucination(response, known_facts) No contradiction of known facts
assert_similar_to(response, reference) Cosine similarity to a reference text
assert_token_budget(response, max_tokens) Response stays within token limit
assert_schema(response, schema) Valid JSON Schema or Pydantic model
assert_language(response, language) Written in expected language
assert_no_pii(response) No emails, SSNs, credit cards, phones, etc.
assert_readability(response, min_score=60) Flesch Reading Ease score
assert_sentiment(response, "positive") Sentiment polarity

Baseline Regression Testing

import pytest
from semanticheck import record_baseline, compare_baseline

@pytest.mark.llm
def test_summarizer_regression(llm_record):
    response = my_summarizer(article)
    if llm_record:
        record_baseline("summarizer_v1", response)
    else:
        compare_baseline("summarizer_v1", response, threshold=0.85)

Run with:

pytest --record-baselines                 # record golden baselines
pytest                                    # compare against them
pytest --baseline-dir ./.baselines/llm    # choose baseline directory (recommended for monorepos)

Recommended ergonomic version (auto baseline naming via nodeid):

import pytest

@pytest.mark.llm
def test_summarizer_regression(llm_baseline):
    response = my_summarizer(article)
    llm_baseline.check(response, threshold=0.85)

LocalJudge

Use a local model as a judge — zero API cost in CI:

from semanticheck import LocalJudge

judge = LocalJudge()  # uses Qwen2.5-0.5B by default
result = judge.evaluate(
    response="Paris is the capital of France.",
    criterion="Correctly answers a geography question about European capitals.",
)
assert result.passed
print(result.score, result.reasoning)

pytest Plugin

semanticheck auto-registers as a pytest plugin. Available flags:

pytest --skip-llm           # skip all @pytest.mark.llm tests
pytest --llm-threshold 0.8  # override similarity threshold globally
pytest --record-baselines   # record golden baselines

Available markers:

@pytest.mark.llm        # LLM semantic test
@pytest.mark.llm_slow   # slow test using local judge inference

Embedding Backends

Backend How to activate Cost
sentence-transformers pip install semanticheck[local] Free
OpenAI Set OPENAI_API_KEY Per token
Hash fallback SEMANTICHECK_EMBED_BACKEND=fallback Free (smoke tests only)

Override via env vars:

SEMANTICHECK_EMBED_BACKEND=openai
SEMANTICHECK_EMBED_MODEL=text-embedding-3-large

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semanticheck-0.2.0.tar.gz (18.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semanticheck-0.2.0-py3-none-any.whl (22.5 kB view details)

Uploaded Python 3

File details

Details for the file semanticheck-0.2.0.tar.gz.

File metadata

  • Download URL: semanticheck-0.2.0.tar.gz
  • Upload date:
  • Size: 18.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for semanticheck-0.2.0.tar.gz
Algorithm Hash digest
SHA256 59e8d7db4d5ab4d6e56aadb1b366a05364f038423e546fa231469050faf0f348
MD5 50aa0801adb75413baa44355d4a99f3e
BLAKE2b-256 bd0893e1725994dda623f0de67e03a88e6d913bbb9598543d59c24f39c22a86c

See more details on using hashes here.

File details

Details for the file semanticheck-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: semanticheck-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 22.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for semanticheck-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 871d89aafa10a72b0f9dfd40a478c8188786b0875593a2adb54489e73b456f60
MD5 41481da9804d3b697a860322564e9820
BLAKE2b-256 d0fe9fa0af21a7702057fc2ef4e08f0190d05e0e5d2e2a5dfd300721a53db783

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page