Skip to main content

Contract testing framework for LLM prompts - PCSL (Prompt Contract Specification Language)

Project description

Prompt Contracts

CI PyPI version Python 3.10+ License: MIT Code style: black

Test your LLM prompts like code.

Prompt-Contracts is a specification and toolkit that brings contract testing to LLM prompt interactions. When models drift due to provider updates, parameter changes, or switches to local models, integrations can silently break. This framework enables structural, semantic, and behavioral validation of LLM responses.


Table of Contents


Overview

Prompt-Contracts implements the Prompt Contract Specification Language (PCSL), a formal specification for defining, validating, and enforcing LLM prompt behavior. Similar to how OpenAPI defines REST API contracts or JSON Schema defines data contracts, PCSL defines:

  • What a prompt expects as input
  • How the LLM should respond (structure, semantics, performance)
  • Where these expectations should hold (which models, providers, parameters)

Common Problems Solved

  • JSON Breakage: Responses become invalid or wrapped in markdown code fences
  • Missing Fields: Required fields disappear from structured outputs
  • Enum Drift: Values drift from expected enums ("urgent" instead of "high")
  • Performance Regression: Latency and token budgets exceed acceptable limits
  • Model Switching: Behavior changes when switching between providers or model versions

Key Features

PCSL v0.1 Implementation

Specification & Validation

  • Formal PCSL specification with JSON Schema validation
  • Three artefact types: Prompt Definition (PD), Expectation Suite (ES), Evaluation Profile (EP)
  • Progressive conformance levels (L1-L3)

Execution Modes

  • observe: Validation-only mode with no modifications
  • assist: Prompt augmentation with auto-generated constraints
  • enforce: Schema-guided JSON generation (OpenAI structured outputs)
  • auto: Adaptive mode with intelligent fallback chain

Auto-Repair & Retries

  • Bounded retry mechanism with configurable limits
  • Automatic output normalization (strip markdown fences, lowercase fields)
  • Detailed repair tracking and status reporting

Schema-Guided Enforcement

  • Automatic JSON Schema derivation from expectation suites
  • OpenAI structured output integration via response_format
  • Capability negotiation for provider-specific features

Full IO Transparency

  • Complete artifact saving with --save-io flag
  • Per-fixture storage of inputs, outputs, and metadata
  • Cryptographic prompt hashing for reproducibility
  • Timestamped execution traces

Multi-Provider Support

  • OpenAI adapter with schema enforcement capabilities
  • Ollama adapter for local model execution
  • Extensible adapter architecture

Comprehensive Reporting

  • CLI reporter with rich formatting
  • JSON reporter for machine-readable output
  • JUnit XML for CI/CD integration

Quick Start

Prerequisites

  • Python 3.10 or higher
  • Ollama (for local models) or OpenAI API key

Installation

# Install dependencies
pip install -r requirements.txt

# Install package
pip install -e .

Setup Ollama (Optional)

# Install Ollama
brew install ollama

# Start server
ollama serve

# Pull model
ollama pull mistral

Run Example Contract

prompt-contracts run \
  --pd examples/support_ticket/pd.json \
  --es examples/support_ticket/es.json \
  --ep examples/support_ticket/ep.json \
  --report cli

Expected Output:

TARGET ollama:mistral
  mode: assist

Fixture: pwd_reset (latency: 2314ms, status: REPAIRED, retries: 0)
  Repairs applied: lowercased $.priority
  PASS | pc.check.json_valid
         Response is valid JSON
  PASS | pc.check.json_required
         All required fields present: ['category', 'priority', 'reason']
  PASS | pc.check.enum
         Value 'high' is in allowed values ['low', 'medium', 'high']
  ...

Summary: 11/11 checks passed (1 PASS, 1 REPAIRED) — status: YELLOW

Core Concepts

Artefact Types

Prompt Definition (PD)

Describes the canonical prompt and I/O expectations.

{
  "pcsl": "0.1.0",
  "id": "support.ticket.classify.v1",
  "io": {
    "channel": "text",
    "expects": "structured/json"
  },
  "prompt": "You are a support classifier. Reply ONLY with strict JSON."
}

Expectation Suite (ES)

Declares validation checks as properties that must hold for every execution.

{
  "pcsl": "0.1.0",
  "checks": [
    { "type": "pc.check.json_valid" },
    { 
      "type": "pc.check.json_required", 
      "fields": ["category", "priority", "reason"] 
    },
    { 
      "type": "pc.check.enum", 
      "field": "$.priority", 
      "allowed": ["low", "medium", "high"] 
    },
    { "type": "pc.check.regex_absent", "pattern": "```" },
    { "type": "pc.check.token_budget", "max_out": 200 },
    { "type": "pc.check.latency_budget", "p95_ms": 5000 }
  ]
}

Evaluation Profile (EP)

Defines execution context: models, test fixtures, and tolerance thresholds.

{
  "pcsl": "0.1.0",
  "targets": [
    {
      "type": "ollama",
      "model": "mistral",
      "params": { "temperature": 0 }
    }
  ],
  "fixtures": [
    { "id": "pwd_reset", "input": "User: My password doesn't work." },
    { "id": "billing", "input": "User: I was double charged." }
  ],
  "execution": {
    "mode": "assist",
    "max_retries": 1,
    "auto_repair": {
      "lowercase_fields": ["$.priority"],
      "strip_markdown_fences": true
    }
  },
  "tolerances": {
    "pc.check.json_valid": { "max_fail_rate": 0.0 },
    "pc.check.enum": { "max_fail_rate": 0.01 }
  }
}

Execution Modes

observe (Validation Only)

  • No modifications to prompts or outputs
  • Pure validation against expectation suite
  • Status: PASS or FAIL only

assist (Prompt Augmentation)

  • Automatically augments prompts with constraint blocks
  • Example: enum check generates "priority MUST be one of: low, medium, high"
  • Supports bounded retries with auto-repair
  • Status: PASS, REPAIRED, or FAIL

enforce (Schema-Guided JSON)

  • Uses adapter capabilities for schema-guided generation
  • Derives JSON Schema from expectation suite
  • OpenAI: Uses response_format with structured outputs
  • Falls back to assist if adapter doesn't support enforcement
  • Status: PASS, REPAIRED, FAIL, or NONENFORCEABLE

auto (Adaptive)

  • Intelligently selects best mode based on adapter capabilities
  • Fallback chain: enforce → assist → observe
  • Default mode for maximum compatibility
  • Maximizes enforcement while maintaining broad support

Status Codes

Per-Fixture Status

  • PASS: Validation succeeded on first attempt
  • REPAIRED: Validation succeeded after auto-repair application
  • FAIL: Validation failed after exhausting all retries
  • NONENFORCEABLE: Enforcement requested but adapter lacks capability

Per-Target Status

  • GREEN: All fixtures passed without repairs
  • YELLOW: Some fixtures repaired or marked nonenforceable
  • RED: One or more fixtures failed validation

Installation

From Source

git clone https://github.com/promptcontracts/prompt-contracts.git
cd prompt-contracts
pip install -r requirements.txt
pip install -e .

Verify Installation

prompt-contracts --help

Usage

CLI Commands

Validate Artefacts

Validate artefacts against PCSL schemas:

prompt-contracts validate pd examples/support_ticket/pd.json
prompt-contracts validate es examples/support_ticket/es.json
prompt-contracts validate ep examples/support_ticket/ep.json

Run Contract

Execute a complete contract with validation:

prompt-contracts run \
  --pd <path-to-pd> \
  --es <path-to-es> \
  --ep <path-to-ep> \
  [--report cli|json|junit] \
  [--out <output-path>] \
  [--save-io <artifacts-directory>]

Arguments:

  • --pd: Path to Prompt Definition (required)
  • --es: Path to Expectation Suite (required)
  • --ep: Path to Evaluation Profile (required)
  • --report: Report format - cli (default), json, or junit
  • --out: Output path for report file (optional)
  • --save-io: Directory to save execution artifacts (optional)

Execution Configuration

Configure execution behavior in the Evaluation Profile:

{
  "execution": {
    "mode": "assist",
    "max_retries": 1,
    "auto_repair": {
      "lowercase_fields": ["$.priority", "$.status"],
      "strip_markdown_fences": true
    }
  }
}

Configuration Options:

  • mode: Execution mode (auto, enforce, assist, observe)
  • max_retries: Maximum retry attempts on validation failure (default: 1)
  • auto_repair.lowercase_fields: JSONPath fields to lowercase
  • auto_repair.strip_markdown_fences: Remove code fence markers (default: true)

Artifact Saving

Enable comprehensive artifact saving with --save-io:

prompt-contracts run \
  --pd pd.json --es es.json --ep ep.json \
  --save-io artifacts/

Directory Structure:

artifacts/
  <target-id>/
    <fixture-id>/
      input_final.txt      # Final prompt with augmentations
      output_raw.txt       # Raw model response
      output_norm.txt      # Normalized output after auto-repair
      run.json             # Complete execution metadata

run.json Contents:

{
  "pcsl": "0.1.0",
  "target": "ollama:mistral",
  "params": { "temperature": 0 },
  "execution": {
    "mode": "assist",
    "effective_mode": "assist",
    "max_retries": 1
  },
  "latency_ms": 2314,
  "retries_used": 0,
  "status": "REPAIRED",
  "repaired_details": {
    "stripped_fences": true,
    "lowercased_fields": ["$.priority"]
  },
  "checks": [...],
  "prompt_hash": "a1b2c3...",
  "timestamp": "2025-10-07T12:34:56Z"
}

PCSL Specification

Conformance Levels

PCSL defines progressive conformance levels:

L1 - Structural Conformance

  • JSON validity validation
  • Required field presence checking
  • Token budget enforcement
  • Basic structural guarantees

L2 - Semantic Conformance

Includes L1 plus:

  • Enum value validation with JSONPath
  • Regex pattern assertions (presence/absence)
  • Advanced field-level checks
  • Semantic property validation

L3 - Differential Conformance

Includes L2 plus:

  • Multi-target execution and comparison
  • Pass-rate validation across models
  • Latency budget enforcement (p95)
  • Tolerance-based acceptance criteria

L4 - Security Conformance (Planned)

Includes L3 plus:

  • Jailbreak escape-rate metrics
  • PII leakage detection
  • Adversarial robustness testing
  • Security property validation

Built-in Checks

pc.check.json_valid

Validates response is parseable JSON.

Parameters: None

{ "type": "pc.check.json_valid" }

pc.check.json_required

Validates presence of required fields at root level.

Parameters:

  • fields (array): Required field names
{ 
  "type": "pc.check.json_required", 
  "fields": ["category", "priority", "reason"] 
}

pc.check.enum

Validates field value against allowed enumeration.

Parameters:

  • field (string): JSONPath to field
  • allowed (array): Allowed values
  • case_insensitive (boolean, optional): Case-insensitive comparison
{ 
  "type": "pc.check.enum", 
  "field": "$.priority", 
  "allowed": ["low", "medium", "high"],
  "case_insensitive": false
}

pc.check.regex_absent

Validates regex pattern is NOT present in response.

Parameters:

  • pattern (string): Regex pattern
{ "type": "pc.check.regex_absent", "pattern": "```" }

pc.check.token_budget

Validates response length stays within token budget.

Parameters:

  • max_out (integer): Maximum output tokens
{ "type": "pc.check.token_budget", "max_out": 200 }

Note: Current implementation approximates tokens by word count.

pc.check.latency_budget

Validates p95 latency across all fixtures.

Parameters:

  • p95_ms (integer): p95 latency threshold in milliseconds
{ "type": "pc.check.latency_budget", "p95_ms": 5000 }

Adapters

OpenAI Adapter

Uses OpenAI SDK with full schema enforcement support.

Capabilities:

  • schema_guided_json: True
  • tool_calling: True
  • function_call_json: False

Features:

  • Structured output via response_format with JSON Schema
  • Enables enforce mode for guaranteed structure
  • Parameter support: temperature, max_tokens

Configuration:

{
  "type": "openai",
  "model": "gpt-4o-mini",
  "params": {
    "temperature": 0,
    "max_tokens": 500
  }
}

Ollama Adapter

Supports local model execution via Ollama API.

Capabilities:

  • schema_guided_json: False
  • tool_calling: False
  • function_call_json: False

Features:

  • Local model execution
  • HTTP API integration
  • Falls back to assist mode in auto/enforce
  • Parameter support: temperature

Configuration:

{
  "type": "ollama",
  "model": "mistral",
  "params": {
    "temperature": 0
  }
}

Custom Adapters

Implement custom adapters by subclassing AbstractAdapter:

from promptcontracts.core.adapters import AbstractAdapter, Capability

class CustomAdapter(AbstractAdapter):
    def capabilities(self) -> Capability:
        return Capability(
            schema_guided_json=True,
            tool_calling=False,
            function_call_json=False
        )
    
    def generate(self, prompt: str, schema=None):
        # Implementation
        return response_text, latency_ms

Reporters

CLI Reporter

Rich-formatted terminal output with color coding and hierarchical structure.

Usage:

prompt-contracts run --report cli [--out output.txt]

Features:

  • Color-coded status indicators
  • Hierarchical fixture/check display
  • Repair detail tracking
  • Artifact path display
  • Summary statistics

JSON Reporter

Machine-readable JSON output for programmatic consumption.

Usage:

prompt-contracts run --report json [--out results.json]

Features:

  • Complete result serialization
  • Artifact path inclusion
  • Metadata enrichment
  • Timestamping

JUnit Reporter

JUnit XML format for CI/CD integration.

Usage:

prompt-contracts run --report junit [--out junit.xml]

Features:

  • Standard JUnit XML format
  • Test case per check
  • Failure detail capture
  • CI/CD pipeline integration

Architecture

Project Structure

src/promptcontracts/
  cli.py                    # CLI entry points
  core/
    loader.py               # Artefact loading and schema validation
    validator.py            # Check registry and execution
    runner.py               # Contract orchestration
    checks/                 # Built-in check implementations
      json_valid.py
      json_required.py
      enum_value.py
      regex_absent.py
      token_budget.py
      latency_budget.py
    adapters/               # LLM provider adapters
      base.py
      openai_adapter.py
      ollama_adapter.py
    reporters/              # Output formatters
      cli_reporter.py
      json_reporter.py
      junit_reporter.py
  spec/                     # PCSL specification
    pcsl-v0.1.md
    schema/
      pcsl-pd.schema.json
      pcsl-es.schema.json
      pcsl-ep.schema.json
examples/                   # Example contracts
tests/                      # Test suite

Dependencies

Core:

  • pyyaml: YAML parsing
  • jsonschema: Schema validation
  • jsonpath-ng: JSONPath evaluation
  • httpx: HTTP client for Ollama
  • numpy: Statistical calculations

Provider SDKs:

  • openai: OpenAI API integration

CLI:

  • rich: Terminal formatting

Testing

Run Test Suite

# All tests
pytest tests/ -v

# Specific test module
pytest tests/test_enforcement.py -v

# With coverage
pytest tests/ --cov=promptcontracts --cov-report=html

Test Categories

  • Loader Tests: Schema validation, file parsing
  • Check Tests: Individual check logic
  • Enforcement Tests: Normalization, schema derivation, retries
  • Integration Tests: End-to-end contract execution

Current Coverage

  • 17 tests passing
  • Core functionality: 100%
  • Enforcement features: 100%
  • Edge cases: Ongoing

Roadmap

Completed (v0.1)

  • PCSL specification v0.1 with JSON Schemas
  • Execution modes (observe, assist, enforce, auto)
  • Auto-repair and bounded retries
  • Schema-guided JSON (OpenAI structured outputs)
  • Artifact saving with full IO transparency
  • OpenAI and Ollama adapters
  • CLI, JSON, and JUnit reporters
  • Conformance levels L1-L3 (scaffold)

Planned (v0.2)

  • L3 Differential runner enhancements
    • Statistical significance testing
    • Drift detection algorithms
    • A/B testing support
  • HTML reporter with visualization
    • Trend charts
    • Diff views
    • Interactive filtering
  • Additional check types
    • JSON Schema field validation
    • Numeric range checks
    • Cross-field dependencies
    • String length validation

Planned (v0.3)

  • L4 Security conformance
    • Jailbreak escape-rate metrics
    • PII leakage detection
    • Prompt injection testing
    • Adversarial robustness
  • Additional adapters
    • Anthropic Claude
    • Google Gemini
    • Azure OpenAI
    • Hugging Face
  • Observability integration
    • OpenTelemetry export
    • Prometheus metrics
    • Grafana dashboards

Planned (Future)

  • Multi-modal support (images, audio)
  • GitHub Action and GitLab CI templates
  • VS Code extension
  • Pre-commit hooks
  • Fine-tuning contract integration
  • Production monitoring integration

Contributing

Spec Governance

The PCSL specification lives under src/promptcontracts/spec/. Changes to the specification follow an RFC process:

  1. Open a GitHub Issue describing the proposed change
  2. Label as spec-rfc
  3. Community discussion and feedback
  4. Approval by maintainers
  5. Implementation and documentation

Development Setup

# Clone repository
git clone https://github.com/promptcontracts/prompt-contracts.git
cd prompt-contracts

# Install development dependencies
pip install -r requirements.txt
pip install -e .

# Run tests
pytest tests/ -v

Contribution Guidelines

  • Follow existing code style and patterns
  • Add tests for new features
  • Update documentation
  • Ensure all tests pass
  • Write clear commit messages

Versioning

PCSL and prompt-contracts follow Semantic Versioning:

  • Patch (0.1.x): Bug fixes, clarifications
  • Minor (0.x.0): New features, backward-compatible additions
  • Major (x.0.0): Breaking changes to artefact structure or behavior

License

Code: MIT License
Documentation: CC-BY 4.0

See LICENSE file for details.


Support

  • Documentation: See QUICKSTART.md for getting started guide
  • Specification: Read src/promptcontracts/spec/pcsl-v0.1.md for detailed spec
  • Issues: Report bugs and request features via GitHub Issues
  • Discussions: Join community discussions on GitHub Discussions

Citation

If you use Prompt-Contracts in your research or production systems, please cite:

@software{promptcontracts2025,
  title = {Prompt-Contracts: Contract Testing for LLM Prompts},
  author = {Prompt-Contracts Contributors},
  year = {2025},
  url = {https://github.com/promptcontracts/prompt-contracts},
  version = {0.1.0}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prompt_contracts-0.2.0.tar.gz (51.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prompt_contracts-0.2.0-py3-none-any.whl (43.0 kB view details)

Uploaded Python 3

File details

Details for the file prompt_contracts-0.2.0.tar.gz.

File metadata

  • Download URL: prompt_contracts-0.2.0.tar.gz
  • Upload date:
  • Size: 51.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for prompt_contracts-0.2.0.tar.gz
Algorithm Hash digest
SHA256 be256a0c7db1ae760b925608a0ad595affb00ab3c1f35e5a821ec299f5c15d5a
MD5 c932c89ee7b33e7fd65293252ea3da5a
BLAKE2b-256 bbae24a8ca9793900daf3a330d34ee7ef0be502b4208b09c28a87da5051d163c

See more details on using hashes here.

Provenance

The following attestation bundles were made for prompt_contracts-0.2.0.tar.gz:

Publisher: publish-pypi.yml on philippmelikidis/prompt-contracts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file prompt_contracts-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for prompt_contracts-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 abcbb2cafb0b29c4cc85768c45df6203d4c11647c552be97198dd16459ecd877
MD5 faccaa88fe5e03d9140ee33b61bb5631
BLAKE2b-256 147786d6d74069dc086461f96e019fe8bc720efe35703cf6b3c758cc1b17868d

See more details on using hashes here.

Provenance

The following attestation bundles were made for prompt_contracts-0.2.0-py3-none-any.whl:

Publisher: publish-pypi.yml on philippmelikidis/prompt-contracts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page