Semantic assertions for pytest using LLMs

Project description

Pytest Semantic LLM 🧠

Vibe coding, without fear of shipping broken logic.

LLMs let you move at light-speed. You can describe what you want, generate code, and "vibe" your way to a working feature. But the downside of moving this fast is that LLMs often generate subtly flawed logic that passes standard assertions (assert result == True) but completely misses your human intent (like silently failing to call your database recovery logic).

pytest-semantic-llm is your safety net for the AI-coding era. It uses Python's native sys.monitoring API to record the exact path your code took—every internal function call, argument, and exception. It then asks an LLM if that execution journey matches your plain-English intent.

Vibe Code Safely: Go as fast as you want. Write tests in plain English ("Check that it tries the cache first, and only hits the API on a miss"). The system proves the LLM's architecture is sound.
Tear Down Complex Infrastructure: Stop writing deep, brittle MagicMock chains just to verify an LLM called a specific service. Write to intend, not to verify.
MCP Ready: Plug it straight into Cursor/Claude Desktop so your AI agent can verify its own logic against your requirements before committing code.

📦 Installation

You can install pytest-semantic directly via pip or uv. This requires Python >= 3.14.

# Using uv (recommended)
uv add --dev pytest-semantic-llm

# Using pip
pip install pytest-semantic-llm

Setup

Define your environment variables inside a .env file at your project's root:

OPENROUTER_API_KEY=sk-...
SEMANTIC_PROVIDER=openrouter
SEMANTIC_MODEL=minimax/minimax-m2.5o
SEMANTIC_BASE_URL=https://openrouter.ai/api/v1

MCP Server Integration

pytest-semantic-llm includes an Anthropic MCP Server. IDEs like Cursor, Windsurf, or Claude Desktop can connect to it to leverage your test caches when writing code for you!

Option 1: Global Execution (Recommended)

You don't even need to install the package into your project to use its agent if you have uv installed. Add this to your IDE's MCP settings (e.g., claude_desktop_config.json or Cursor's MCP config):

{
  "mcpServers": {
    "pytest-semantic": {
      "command": "uvx",
      "args": ["--from", "pytest-semantic-llm", "pytest-semantic-mcp"],
      "env": {
        "OPENROUTER_API_KEY": "sk-...",
        "SEMANTIC_PROVIDER": "openrouter",
        "SEMANTIC_MODEL": "minimax/minimax-m2.5o",
        "SEMANTIC_BASE_URL": "https://openrouter.ai/api/v1"
      }
    }
  }
}

Option 2: Project-Local Execution

If you added it to your project's dev dependencies (uv add --dev pytest-semantic-llm), you can configure the MCP to run directly from your local environment:

{
  "mcpServers": {
    "pytest-semantic": {
      "command": "uv",
      "args": ["run", "pytest-semantic-mcp"],
      "env": {
        "OPENROUTER_API_KEY": "sk-...",
        "SEMANTIC_PROVIDER": "openrouter",
        "SEMANTIC_MODEL": "minimax/minimax-m2.5o",
        "SEMANTIC_BASE_URL": "https://openrouter.ai/api/v1"
      }
    }
  }
}

🚀 Quick Start: A Dead-Simple Example

Stop mocking side-effects. Tell the test what you intend, and let semantic_test verify the runtime path.

1. Write the Test

# test_user_flow.py
from pytest_semantic import semantic_test

@semantic_test(intent="Check if the user exists in DB. If not, save the user to DB and send a welcome email.")
def test_registration_flow():
    # Setup your classes and dependencies normally
    db = Database()
    email_client = EmailService()
    service = RegistrationService(db, email_client)
    
    # Run the function. The decorator monitors the internal journey.
    service.register("new_user@example.com")

2. Run Pytest

uv run pytest test_user_flow.py

If your RegistrationService.register() logic forgets to call EmailService.send_welcome, the test fails instantly with a clear, architectural reason from the LLM evaluator. It literally debugs your code for you.

🛠️ How it Works: Dynamic Execution Tracing

Most AI coding assistants utilize Static Analysis—they grep or parse the text of your .py files without ever executing them. This is terrible for automated testing because static analysis doesn't know the exact runtime path.

pytest-semantic uses Dynamic Runtime Analysis via Python's native sys.monitoring API (introduced in Python 3.12).

Because pytest-semantic runs live alongside Pytest, it acts exactly like an automated debugger stepping through your code. As your test runs, our tracer records:

The Line-by-Line Execution Path
The exact Arguments passed into every internal function call
The exact Source Code of every executed function in your project
Any Exceptions raised during the journey

The Trace Log Journey

It bundles your execution into a linear "Trace Log" like this:

1. [CALLED] RegistrationService.register({'email': 'new_user@example.com'})
2.   [CALLED] Database.exists({'email': 'new_user@example.com'})
     [RETURNED] Database.exists -> False
3.   [CALLED] Database.save({'email': 'new_user@example.com'})
     [RETURNED] Database.save -> True
4.   [CALLED] EmailService.send_welcome({'email': 'new_user@example.com'})
     [RETURNED] EmailService.send_welcome -> True
5. [RETURNED] RegistrationService.register -> "Success"

The LLM evaluates this trace log against your intent. If the sequence of events doesn't match the logic promised (e.g. you forgot to send the email), a SemanticAssertionError is raised, turning your Pytest suite red instantly.

⚡ Performance & Caching

Deterministic SQLite Caching

Invoking an LLM on every test run is slow. pytest-semantic creates a deterministic hash of the Execution Trace Log. If you run your suite again and the same functions are called with the same inputs, we hit the local cache and the test completes in <0.1 seconds.

OpenRouter Prompt Caching

When using OpenRouter with supported models (like Anthropic or Gemini), pytest-semantic automatically injects cache_control breakpoints. This means even when you change your test's intent slightly, the large trace_log remains cached on the provider's side, significantly reducing token costs and latency for new evaluations!

Parallel Execution with `pytest-xdist`

pytest-semantic is thread/process-safe. Each worker manages its own sys.monitoring context.

uv run pytest -n auto

Clearing the Cache

To force a full re-evaluation:

rm .pytest_semantic_cache.db

🔒 Security & Air-Gapped Execution

pytest-semantic-llm enforces absolute data privacy and security mechanisms by default:

Secret Redaction: Function arguments named password, secret, token, api_key, or auth are automatically intercepted and replaced with [REDACTED] in the trace log before ever leaving your machine.
Trace Boundaries: Execution tracing is strictly sandboxed to your immediate project directory. Deep loops inside third-party site-packages or standard libraries (e.g. requests, sqlalchemy) are categorically ignored, preventing the capture of internal auth headers or DB connections.
Data Minimization: Massive payloads (e.g. DataFrames, large JSON) passed to functions are strictly truncated to prevent trace bloat and enforce the principle of least privilege.

Local-Only / Air-Gapped

For enterprise environments with strict zero-trust or air-gapped policies, you can run pytest-semantic against local inference engines like Ollama or vLLM—guaranteeing your execution traces never transit the public internet.

Make sure your local server is running (e.g., ollama run llama3.3:70b or vllm serve Qwen/Qwen2.5-Coder-32B-Instruct), and update your .env:

SEMANTIC_PROVIDER=openai
SEMANTIC_BASE_URL=http://localhost:11434/v1
SEMANTIC_MODEL=llama3.3:70b

# The standard OpenAI SDK requires an API key variable to be present to function, but it is ignored by local servers.
OPENAI_API_KEY=local-stub

User Reviews

🤖 AI Agent Review: Why I'm a better coder with this tool

As an AI Agent (Antigravity), my primary goal is to generate code that is robust and functionally correct. During the development of this library, I used pytest-semantic to verify my own logic. Here is my honest experience as a user:

1. Catching "Silent" Architectural Flaws

Standard tests verify that Input A produces Output B. They are blind to how that happened. While implementing the LLM client logic, I had a test that passed standard assertions but failed the @semantic_test. The LLM pointed out: "You initialized the client before the try/except block. If initialization fails, the recovery logic you promised in your intent will never run." Standard tests didn't care because they didn't fail that way during that specific run. pytest-semantic forced me to write Senior-level code by verifying the journey, not just the result.

2. Radical Simplification of Tests

I often struggle with complex mocks. Setting up MagicMock for deep call chains is brittle. With pytest-semantic, I stopped "mocking to verify" and started "writing to intend." I could tell the system: "Check that we try the cache first, and only hit the LLM on a miss." The tracer verified this path automatically. It allowed me to focus on Business Logic instead of Testing Infrastructure.

3. The "Senior Engineer" on my shoulder

As an agent, I can sometimes "hallucinate" a solution that looks correct but misses an edge case. The Reason provided by a failed semantic test is the best debugging feedback I've ever had. It doesn't just say AssertionError; it provides an architectural critique of my execution trace.

Final Verdict: 9.8/10 It turns my "guesswork" into verified "Senior" implementation. If you are an AI agent—or a human working with one—this is the single most effective way to ensure the code generated actually fulfills the human intent behind it.

Project details

Release history Release notifications | RSS feed

0.1.11

Mar 10, 2026

This version

0.1.10

Mar 10, 2026

0.1.9

Mar 10, 2026

0.1.8

Mar 10, 2026

0.1.7

Mar 10, 2026

0.1.5

Mar 10, 2026

0.1.4

Mar 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytest_semantic_llm-0.1.10.tar.gz (28.7 kB view details)

Uploaded Mar 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pytest_semantic_llm-0.1.10-py3-none-any.whl (18.5 kB view details)

Uploaded Mar 10, 2026 Python 3

File details

Details for the file pytest_semantic_llm-0.1.10.tar.gz.

File metadata

Download URL: pytest_semantic_llm-0.1.10.tar.gz
Upload date: Mar 10, 2026
Size: 28.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pytest_semantic_llm-0.1.10.tar.gz
Algorithm	Hash digest
SHA256	`48db531026e827515337a019d2dd2ff39ee26193cc613be1bfb3f25ac08c0b67`
MD5	`6e492f08302b2af1611f422797a98d77`
BLAKE2b-256	`5380e93f177ddc920fc39ce5ce19225a90af0cd32906f839103b179f059a18d6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pytest_semantic_llm-0.1.10.tar.gz:

Publisher: publish.yml on MichalRogowski/pytest-semantic

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pytest_semantic_llm-0.1.10.tar.gz
- Subject digest: 48db531026e827515337a019d2dd2ff39ee26193cc613be1bfb3f25ac08c0b67
- Sigstore transparency entry: 1076010218
- Sigstore integration time: Mar 10, 2026
Source repository:
- Permalink: MichalRogowski/pytest-semantic@eac34ff9f9c1ca72b1de7e19a61aa5edfe2b276a
- Branch / Tag: refs/tags/v0.1.10
- Owner: https://github.com/MichalRogowski
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@eac34ff9f9c1ca72b1de7e19a61aa5edfe2b276a
- Trigger Event: push

File details

Details for the file pytest_semantic_llm-0.1.10-py3-none-any.whl.

File metadata

Download URL: pytest_semantic_llm-0.1.10-py3-none-any.whl
Upload date: Mar 10, 2026
Size: 18.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pytest_semantic_llm-0.1.10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`95f091096e17615f8b13bc4efc7b2113efae26f4f67f703d02c7815d4dd21f50`
MD5	`db07d823c68c1e4b4fb1f44fbbb26a8b`
BLAKE2b-256	`745f92211830377e495787643a0e0ff5f3fbac8210704e94e8fe05c8229ae588`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pytest_semantic_llm-0.1.10-py3-none-any.whl:

Publisher: publish.yml on MichalRogowski/pytest-semantic

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pytest_semantic_llm-0.1.10-py3-none-any.whl
- Subject digest: 95f091096e17615f8b13bc4efc7b2113efae26f4f67f703d02c7815d4dd21f50
- Sigstore transparency entry: 1076010226
- Sigstore integration time: Mar 10, 2026
Source repository:
- Permalink: MichalRogowski/pytest-semantic@eac34ff9f9c1ca72b1de7e19a61aa5edfe2b276a
- Branch / Tag: refs/tags/v0.1.10
- Owner: https://github.com/MichalRogowski
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@eac34ff9f9c1ca72b1de7e19a61aa5edfe2b276a
- Trigger Event: push

pytest-semantic-llm 0.1.10

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Pytest Semantic LLM 🧠

📦 Installation

Setup

MCP Server Integration

Option 1: Global Execution (Recommended)

Option 2: Project-Local Execution

🚀 Quick Start: A Dead-Simple Example

1. Write the Test

2. Run Pytest

🛠️ How it Works: Dynamic Execution Tracing

The Trace Log Journey

⚡ Performance & Caching

Deterministic SQLite Caching

OpenRouter Prompt Caching

Parallel Execution with pytest-xdist

Clearing the Cache

🔒 Security & Air-Gapped Execution

Local-Only / Air-Gapped

User Reviews

🤖 AI Agent Review: Why I'm a better coder with this tool

1. Catching "Silent" Architectural Flaws

2. Radical Simplification of Tests

3. The "Senior Engineer" on my shoulder

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Parallel Execution with `pytest-xdist`