Semantic specification drift detector for LLM outputs — catches semantic violations Pydantic cannot see.

These details have not been verified by PyPI

Project links

Project description

spec-drift

Your LLM outputs pass Pydantic. That's not enough.

spec-drift detects when your LLM outputs drift semantically from their declared specification — even when structural validation passes. It monitors continuous semantic compliance in production, generates drift reports, and provides a CI gate for semantic regression testing.

The Problem

Every team shipping LLM features uses Pydantic or JSON Schema to validate output structure. These tools are excellent — and completely blind to semantic drift.

Consider what happens in production LLM systems over time:

Silent model updates: Your LLM provider silently updates the underlying model. The API contract (field names, types, schema) doesn't change. But the distribution of values inside those fields shifts. Your "sentiment" field starts returning "ambivalent" where it previously returned "neutral." Your "risk_level" classification shifts its decision boundary. Pydantic sees nothing.

Prompt erosion: A prompt is modified through six iterations of "just a small tweak." Each tweak passes regression tests individually. But cumulatively, the semantic profile of outputs drifts. The "reasoning" field that used to average 120 words now averages 30. Your validation still passes.

Input distribution shift: A new user cohort or marketing campaign brings different input patterns. The same model, same prompt, same schema — but outputs drift from the spec because they were calibrated for a different input distribution.

spec-drift catches all of these. Pydantic catches none of them.

The Biblical Foundation

"Moses then said to Aaron, 'This is what the Lord spoke of when he said: Among those who approach me I will be proved holy.'" — Leviticus 10:3

Nadab and Abihu offered "unauthorized fire" — structurally correct (fire), instrumentally correct (censers), personally authorized (priests). But the semantic specification was violated. Every structural check passed. The semantic compliance check failed.

spec-drift applies this principle to LLM outputs: structural validation is necessary but not sufficient. Semantic specification must be declared and continuously monitored.

BibleWorld build — PAT-037, Pivot_Score 8.63

Installation

pip install spec-drift

Requirements: Python 3.10+, Pydantic v2

Quick Start

1. Declare a semantic spec

from pydantic import BaseModel
from spec_drift import spec, SemanticConstraint

@spec(
    category=SemanticConstraint.from_authorized_values(
        ["positive", "negative", "neutral"],
        tolerance=0.02,       # max 2% outputs outside authorized set
        alert_threshold=0.10  # alert if >10% observations violate
    ),
    reasoning=SemanticConstraint.from_length_bounds(
        min_words=30,
        max_words=300,
        alert_threshold=0.15
    ),
    score=SemanticConstraint.from_distribution(
        mean=6.5,
        std=2.0,
        drift_threshold=1.0,  # alert if mean shifts >1 sigma
        alert_threshold=0.20
    )
)
class SentimentAnalysis(BaseModel):
    category: str
    reasoning: str
    score: float

2. Wrap your LLM function

from spec_drift import DriftMonitor
import anthropic

client = anthropic.Anthropic()

monitor = DriftMonitor(
    spec=SentimentAnalysis,
    db_path="./spec_drift.db",
    model_version="claude-3-5-haiku-20241022",
)

@monitor.watch
def analyze_sentiment(text: str) -> SentimentAnalysis:
    response = client.messages.create(
        model="claude-3-5-haiku-20241022",
        max_tokens=500,
        messages=[{"role": "user", "content": f"Analyze: {text}"}]
    )
    return SentimentAnalysis.model_validate_json(response.content[0].text)

3. Check drift

# Terminal drift report (last 7 days)
spec-drift check --spec my_module.SentimentAnalysis --since 7d

# CI gate: fail build if >20% semantic violations
spec-drift ci \
  --spec my_module.SentimentAnalysis \
  --test-batch data/ci_batch.jsonl \
  --threshold 0.20 \
  --exit-code

API Reference

`@spec(**constraints)`

Attaches semantic constraints to a Pydantic model class.

@spec(field_name=SemanticConstraint.from_authorized_values([...]))
class MyModel(BaseModel):
    field_name: str

`SemanticConstraint`

`SemanticConstraint.from_authorized_values(authorized, tolerance, alert_threshold)`

Field values must be drawn from the authorized list (within tolerance).

authorized: list of permitted values
tolerance: float, max fraction of outputs outside authorized set before constraint flags
alert_threshold: float, fraction of rolling observations before alert fires

`SemanticConstraint.from_length_bounds(min_words, max_words, alert_threshold)`

String field word count must be within [min_words, max_words].

`SemanticConstraint.from_distribution(mean, std, drift_threshold, alert_threshold)`

Numeric field should follow a distribution near (mean, std). Alerts if observed mean shifts by more than drift_threshold standard deviations.

`SemanticConstraint.from_pattern(regex, min_match_rate, alert_threshold)`

String field should match the regex pattern at min_match_rate frequency.

`DriftMonitor(spec, db_path, model_version, prompt_hash, alert_callback)`

Runtime monitor for semantic specification compliance.

`.watch` (decorator)

Wraps an LLM function to automatically observe its return value.

`.observe(output) -> output`

Manually observe a Pydantic model instance. Returns the output unchanged.

`.drift_report(since_hours) -> dict`

Generate a semantic drift report for the last N hours.

{
    "spec": "SentimentAnalysis",
    "period_hours": 168.0,
    "observations": 4523,
    "violation_rate": 0.0312,
    "severity": "low",
    "field_violation_rates": {
        "category": 0.0089,
        "reasoning": 0.0221,
        "score": 0.0002
    }
}

`run_ci_gate(monitor, test_outputs, threshold) -> (passed, report)`

Run a CI gate on a batch of test outputs. Returns (passed, report).

CLI Reference

# Initialize spec-drift in a project
spec-drift init

# Calibrate a baseline from golden data
spec-drift calibrate \
  --spec my_module.SentimentAnalysis \
  --input-file data/golden_set.jsonl \
  --output baseline.db

# Drift report (table format, last 7 days)
spec-drift check \
  --spec my_module.SentimentAnalysis \
  --since 7d \
  --format table

# CI gate
spec-drift ci \
  --spec my_module.SentimentAnalysis \
  --test-batch data/ci_batch.jsonl \
  --threshold 0.20 \
  --exit-code

# Compare two model versions
spec-drift compare \
  --spec my_module.SentimentAnalysis \
  --baseline-a baseline_gpt4o.db \
  --baseline-b baseline_claude_haiku.db

# HTML report
spec-drift report \
  --spec my_module.SentimentAnalysis \
  --since 30d \
  --output report.html

GitHub Action

# .github/workflows/llm-spec-check.yml
name: LLM Semantic Spec Check

on: [push, pull_request]

jobs:
  spec-drift:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: spec-drift/action@v1
        with:
          spec: my_module.SentimentAnalysis
          test-batch: data/ci_batch.jsonl
          threshold: '0.20'

Severity Levels

Violation Rate	Severity	Recommended Action
0%	NONE	No action needed
< 5%	LOW	Monitor, no immediate action
5-15%	MEDIUM	Investigate, create issue
15-30%	HIGH	Rollback or prompt fix recommended
> 30%	CRITICAL	Immediate rollback

Storage

spec-drift uses SQLite by default — zero infrastructure required.

# Local development (default)
monitor = DriftMonitor(spec=MyModel, db_path="./spec_drift.db")

# PostgreSQL for production (coming in v0.2)
monitor = DriftMonitor(
    spec=MyModel,
    db_url="postgresql://user:pass@host/db"
)

# In-memory for testing
monitor = DriftMonitor(spec=MyModel, db_path=":memory:")

Roadmap

v0.1 (this release)

Core @spec decorator + SemanticConstraint DSL
DriftMonitor with .watch and .observe
SQLite observation store
run_ci_gate function
CLI: check, ci, compare

v0.2

PostgreSQL support
Multi-field correlation monitoring
Automatic model version detection (via LLM API response headers)
Slack/PagerDuty alert integrations
HTML drift reports

v0.3

LLM-judge semantic constraint evaluation (for complex, prose-level constraints)
Baseline versioning with SemVer
Team dashboard (hosted cloud option)
Prometheus/Grafana metrics export

Comparison

Tool	Structural validation	Semantic spec monitoring	Production continuous	CI gate	Open source
Pydantic	YES	NO	NO	NO	YES
DeepEval	No (batch eval)	YES (point-in-time)	NO	YES	YES
Evidently	No (statistical drift)	NO	YES	NO	YES
Langfuse	NO	NO	YES (observability)	NO	YES
spec-drift	YES (via Pydantic)	YES	YES	YES	YES

Contributing

spec-drift is MIT licensed. Contributions welcome.

git clone https://github.com/bibleworld/spec-drift
cd spec-drift
pip install -e ".[dev]"
pytest tests/

License

MIT — free to use, modify, and distribute.

Built with BibleWorld — Pattern: Leviticus 10:1-3 (The Authorized Fire) "Among those who approach me I will be proved holy." — Leviticus 10:3

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Mar 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spec_drift-0.1.0.tar.gz (16.2 kB view details)

Uploaded Mar 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

spec_drift-0.1.0-py3-none-any.whl (13.0 kB view details)

Uploaded Mar 28, 2026 Python 3

File details

Details for the file spec_drift-0.1.0.tar.gz.

File metadata

Download URL: spec_drift-0.1.0.tar.gz
Upload date: Mar 28, 2026
Size: 16.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for spec_drift-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`b2eb0aeb549d822b52132a46fc928491f1381a65dccf6766db457e82a0a5fc86`
MD5	`fc3bc7354184cfe76e524493362b2c90`
BLAKE2b-256	`a961e0bb15bdfa3cc5f7478e54e45cfacf0721b846d2105fed162b57f7b8df39`

See more details on using hashes here.

File details

Details for the file spec_drift-0.1.0-py3-none-any.whl.

File metadata

Download URL: spec_drift-0.1.0-py3-none-any.whl
Upload date: Mar 28, 2026
Size: 13.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for spec_drift-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dd8f48e8b92ab6ce0db0ca435604640e94e09c53a2b8317c48ea640536719110`
MD5	`8e93fe8178f1c787dfd0a381c60e82e0`
BLAKE2b-256	`52fd6c5f137bc860d051933b0f748692f294643f2f727b3528bca3e3123f7d7f`

See more details on using hashes here.

spec-drift 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

spec-drift

The Problem

The Biblical Foundation

Installation

Quick Start

1. Declare a semantic spec

2. Wrap your LLM function

3. Check drift

API Reference

@spec(**constraints)

SemanticConstraint

SemanticConstraint.from_authorized_values(authorized, tolerance, alert_threshold)

SemanticConstraint.from_length_bounds(min_words, max_words, alert_threshold)

SemanticConstraint.from_distribution(mean, std, drift_threshold, alert_threshold)

SemanticConstraint.from_pattern(regex, min_match_rate, alert_threshold)

DriftMonitor(spec, db_path, model_version, prompt_hash, alert_callback)

.watch (decorator)

.observe(output) -> output

.drift_report(since_hours) -> dict

run_ci_gate(monitor, test_outputs, threshold) -> (passed, report)

CLI Reference

GitHub Action

Severity Levels

Storage

Roadmap

v0.1 (this release)

v0.2

v0.3

Comparison

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`@spec(**constraints)`

`SemanticConstraint`

`SemanticConstraint.from_authorized_values(authorized, tolerance, alert_threshold)`

`SemanticConstraint.from_length_bounds(min_words, max_words, alert_threshold)`

`SemanticConstraint.from_distribution(mean, std, drift_threshold, alert_threshold)`

`SemanticConstraint.from_pattern(regex, min_match_rate, alert_threshold)`

`DriftMonitor(spec, db_path, model_version, prompt_hash, alert_callback)`

`.watch` (decorator)

`.observe(output) -> output`

`.drift_report(since_hours) -> dict`

`run_ci_gate(monitor, test_outputs, threshold) -> (passed, report)`