Contract testing framework for LLM prompts - PCSL (Prompt Contract Specification Language)

These details have not been verified by PyPI

Project links

Project description

Prompt Contracts

Test your LLM prompts like code.

Prompt-Contracts is a specification and toolkit that brings contract testing to LLM prompt interactions. When models drift due to provider updates, parameter changes, or switches to local models, integrations can silently break. This framework enables structural, semantic, and behavioral validation of LLM responses.

Overview
Key Features
Quick Start
Core Concepts
Installation
Usage
PCSL Specification
- Conformance Levels
- Built-in Checks
Adapters
Reporters
Architecture
Testing
Roadmap
Contributing
License

Overview

Prompt-Contracts implements the Prompt Contract Specification Language (PCSL), a formal specification for defining, validating, and enforcing LLM prompt behavior. Similar to how OpenAPI defines REST API contracts or JSON Schema defines data contracts, PCSL defines:

What a prompt expects as input
How the LLM should respond (structure, semantics, performance)
Where these expectations should hold (which models, providers, parameters)

Common Problems Solved

JSON Breakage: Responses become invalid or wrapped in markdown code fences
Missing Fields: Required fields disappear from structured outputs
Enum Drift: Values drift from expected enums ("urgent" instead of "high")
Performance Regression: Latency and token budgets exceed acceptable limits
Model Switching: Behavior changes when switching between providers or model versions

Key Features

PCSL v0.1 Implementation

Specification & Validation

Formal PCSL specification with JSON Schema validation
Three artefact types: Prompt Definition (PD), Expectation Suite (ES), Evaluation Profile (EP)
Progressive conformance levels (L1-L3)

Execution Modes

observe: Validation-only mode with no modifications
assist: Prompt augmentation with auto-generated constraints
enforce: Schema-guided JSON generation (OpenAI structured outputs)
auto: Adaptive mode with intelligent fallback chain

Auto-Repair & Retries

Bounded retry mechanism with configurable limits
Automatic output normalization (strip markdown fences, lowercase fields)
Detailed repair tracking and status reporting

Schema-Guided Enforcement

Automatic JSON Schema derivation from expectation suites
OpenAI structured output integration via response_format
Capability negotiation for provider-specific features

Full IO Transparency

Complete artifact saving with --save-io flag
Per-fixture storage of inputs, outputs, and metadata
Cryptographic prompt hashing for reproducibility
Timestamped execution traces

Multi-Provider Support

OpenAI adapter with schema enforcement capabilities
Ollama adapter for local model execution
Extensible adapter architecture

Comprehensive Reporting

CLI reporter with rich formatting
JSON reporter for machine-readable output
JUnit XML for CI/CD integration

Quick Start

Prerequisites

Python 3.10 or higher
Ollama (for local models) or OpenAI API key

Installation

# Install dependencies
pip install -r requirements.txt

# Install package
pip install -e .

Setup Ollama (Optional)

# Install Ollama
brew install ollama

# Start server
ollama serve

# Pull model
ollama pull mistral

Run Example Contract

prompt-contracts run \
  --pd examples/support_ticket/pd.json \
  --es examples/support_ticket/es.json \
  --ep examples/support_ticket/ep.json \
  --report cli

Expected Output:

TARGET ollama:mistral
  mode: assist

Fixture: pwd_reset (latency: 2314ms, status: REPAIRED, retries: 0)
  Repairs applied: lowercased $.priority
  PASS | pc.check.json_valid
         Response is valid JSON
  PASS | pc.check.json_required
         All required fields present: ['category', 'priority', 'reason']
  PASS | pc.check.enum
         Value 'high' is in allowed values ['low', 'medium', 'high']
  ...

Summary: 11/11 checks passed (1 PASS, 1 REPAIRED) — status: YELLOW

Core Concepts

Artefact Types

Prompt Definition (PD)

Describes the canonical prompt and I/O expectations.

{
  "pcsl": "0.1.0",
  "id": "support.ticket.classify.v1",
  "io": {
    "channel": "text",
    "expects": "structured/json"
  },
  "prompt": "You are a support classifier. Reply ONLY with strict JSON."
}

Expectation Suite (ES)

Declares validation checks as properties that must hold for every execution.

{
  "pcsl": "0.1.0",
  "checks": [
    { "type": "pc.check.json_valid" },
    { 
      "type": "pc.check.json_required", 
      "fields": ["category", "priority", "reason"] 
    },
    { 
      "type": "pc.check.enum", 
      "field": "$.priority", 
      "allowed": ["low", "medium", "high"] 
    },
    { "type": "pc.check.regex_absent", "pattern": "```" },
    { "type": "pc.check.token_budget", "max_out": 200 },
    { "type": "pc.check.latency_budget", "p95_ms": 5000 }
  ]
}

Evaluation Profile (EP)

Defines execution context: models, test fixtures, and tolerance thresholds.

{
  "pcsl": "0.1.0",
  "targets": [
    {
      "type": "ollama",
      "model": "mistral",
      "params": { "temperature": 0 }
    }
  ],
  "fixtures": [
    { "id": "pwd_reset", "input": "User: My password doesn't work." },
    { "id": "billing", "input": "User: I was double charged." }
  ],
  "execution": {
    "mode": "assist",
    "max_retries": 1,
    "auto_repair": {
      "lowercase_fields": ["$.priority"],
      "strip_markdown_fences": true
    }
  },
  "tolerances": {
    "pc.check.json_valid": { "max_fail_rate": 0.0 },
    "pc.check.enum": { "max_fail_rate": 0.01 }
  }
}

Execution Modes

observe (Validation Only)

No modifications to prompts or outputs
Pure validation against expectation suite
Status: PASS or FAIL only

assist (Prompt Augmentation)

Automatically augments prompts with constraint blocks
Example: enum check generates "priority MUST be one of: low, medium, high"
Supports bounded retries with auto-repair
Status: PASS, REPAIRED, or FAIL

enforce (Schema-Guided JSON)

Uses adapter capabilities for schema-guided generation
Derives JSON Schema from expectation suite
OpenAI: Uses response_format with structured outputs
Falls back to assist if adapter doesn't support enforcement
Status: PASS, REPAIRED, FAIL, or NONENFORCEABLE

auto (Adaptive)

Intelligently selects best mode based on adapter capabilities
Fallback chain: enforce → assist → observe
Default mode for maximum compatibility
Maximizes enforcement while maintaining broad support

Status Codes

Per-Fixture Status

PASS: Validation succeeded on first attempt
REPAIRED: Validation succeeded after auto-repair application
FAIL: Validation failed after exhausting all retries
NONENFORCEABLE: Enforcement requested but adapter lacks capability

Per-Target Status

GREEN: All fixtures passed without repairs
YELLOW: Some fixtures repaired or marked nonenforceable
RED: One or more fixtures failed validation

Installation

From Source

git clone https://github.com/promptcontracts/prompt-contracts.git
cd prompt-contracts
pip install -r requirements.txt
pip install -e .

Verify Installation

prompt-contracts --help

Usage

CLI Commands

Validate Artefacts

Validate artefacts against PCSL schemas:

prompt-contracts validate pd examples/support_ticket/pd.json
prompt-contracts validate es examples/support_ticket/es.json
prompt-contracts validate ep examples/support_ticket/ep.json

Run Contract

Execute a complete contract with validation:

prompt-contracts run \
  --pd <path-to-pd> \
  --es <path-to-es> \
  --ep <path-to-ep> \
  [--report cli|json|junit] \
  [--out <output-path>] \
  [--save-io <artifacts-directory>]

Arguments:

--pd: Path to Prompt Definition (required)
--es: Path to Expectation Suite (required)
--ep: Path to Evaluation Profile (required)
--report: Report format - cli (default), json, or junit
--out: Output path for report file (optional)
--save-io: Directory to save execution artifacts (optional)

Execution Configuration

Configure execution behavior in the Evaluation Profile:

{
  "execution": {
    "mode": "assist",
    "max_retries": 1,
    "auto_repair": {
      "lowercase_fields": ["$.priority", "$.status"],
      "strip_markdown_fences": true
    }
  }
}

Configuration Options:

mode: Execution mode (auto, enforce, assist, observe)
max_retries: Maximum retry attempts on validation failure (default: 1)
auto_repair.lowercase_fields: JSONPath fields to lowercase
auto_repair.strip_markdown_fences: Remove code fence markers (default: true)

Artifact Saving

Enable comprehensive artifact saving with --save-io:

prompt-contracts run \
  --pd pd.json --es es.json --ep ep.json \
  --save-io artifacts/

Directory Structure:

artifacts/
  <target-id>/
    <fixture-id>/
      input_final.txt      # Final prompt with augmentations
      output_raw.txt       # Raw model response
      output_norm.txt      # Normalized output after auto-repair
      run.json             # Complete execution metadata

run.json Contents:

{
  "pcsl": "0.1.0",
  "target": "ollama:mistral",
  "params": { "temperature": 0 },
  "execution": {
    "mode": "assist",
    "effective_mode": "assist",
    "max_retries": 1
  },
  "latency_ms": 2314,
  "retries_used": 0,
  "status": "REPAIRED",
  "repaired_details": {
    "stripped_fences": true,
    "lowercased_fields": ["$.priority"]
  },
  "checks": [...],
  "prompt_hash": "a1b2c3...",
  "timestamp": "2025-10-07T12:34:56Z"
}

PCSL Specification

Conformance Levels

PCSL defines progressive conformance levels:

L1 - Structural Conformance

JSON validity validation
Required field presence checking
Token budget enforcement
Basic structural guarantees

L2 - Semantic Conformance

Includes L1 plus:

Enum value validation with JSONPath
Regex pattern assertions (presence/absence)
Advanced field-level checks
Semantic property validation

L3 - Differential Conformance

Includes L2 plus:

Multi-target execution and comparison
Pass-rate validation across models
Latency budget enforcement (p95)
Tolerance-based acceptance criteria

L4 - Security Conformance (Planned)

Includes L3 plus:

Jailbreak escape-rate metrics
PII leakage detection
Adversarial robustness testing
Security property validation

Built-in Checks

pc.check.json_valid

Validates response is parseable JSON.

Parameters: None

{ "type": "pc.check.json_valid" }

pc.check.json_required

Validates presence of required fields at root level.

Parameters:

fields (array): Required field names

{ 
  "type": "pc.check.json_required", 
  "fields": ["category", "priority", "reason"] 
}

pc.check.enum

Validates field value against allowed enumeration.

Parameters:

field (string): JSONPath to field
allowed (array): Allowed values
case_insensitive (boolean, optional): Case-insensitive comparison

{ 
  "type": "pc.check.enum", 
  "field": "$.priority", 
  "allowed": ["low", "medium", "high"],
  "case_insensitive": false
}

pc.check.regex_absent

Validates regex pattern is NOT present in response.

Parameters:

pattern (string): Regex pattern

{ "type": "pc.check.regex_absent", "pattern": "```" }

pc.check.token_budget

Validates response length stays within token budget.

Parameters:

max_out (integer): Maximum output tokens

{ "type": "pc.check.token_budget", "max_out": 200 }

Note: Current implementation approximates tokens by word count.

pc.check.latency_budget

Validates p95 latency across all fixtures.

Parameters:

p95_ms (integer): p95 latency threshold in milliseconds

{ "type": "pc.check.latency_budget", "p95_ms": 5000 }

Adapters

OpenAI Adapter

Uses OpenAI SDK with full schema enforcement support.

Capabilities:

schema_guided_json: True
tool_calling: True
function_call_json: False

Features:

Structured output via response_format with JSON Schema
Enables enforce mode for guaranteed structure
Parameter support: temperature, max_tokens

Configuration:

{
  "type": "openai",
  "model": "gpt-4o-mini",
  "params": {
    "temperature": 0,
    "max_tokens": 500
  }
}

Ollama Adapter

Supports local model execution via Ollama API.

Capabilities:

schema_guided_json: False
tool_calling: False
function_call_json: False

Features:

Local model execution
HTTP API integration
Falls back to assist mode in auto/enforce
Parameter support: temperature

Configuration:

{
  "type": "ollama",
  "model": "mistral",
  "params": {
    "temperature": 0
  }
}

Custom Adapters

Implement custom adapters by subclassing AbstractAdapter:

from promptcontracts.core.adapters import AbstractAdapter, Capability

class CustomAdapter(AbstractAdapter):
    def capabilities(self) -> Capability:
        return Capability(
            schema_guided_json=True,
            tool_calling=False,
            function_call_json=False
        )
    
    def generate(self, prompt: str, schema=None):
        # Implementation
        return response_text, latency_ms

Reporters

CLI Reporter

Rich-formatted terminal output with color coding and hierarchical structure.

Usage:

prompt-contracts run --report cli [--out output.txt]

Features:

Color-coded status indicators
Hierarchical fixture/check display
Repair detail tracking
Artifact path display
Summary statistics

JSON Reporter

Machine-readable JSON output for programmatic consumption.

Usage:

prompt-contracts run --report json [--out results.json]

Features:

Complete result serialization
Artifact path inclusion
Metadata enrichment
Timestamping

JUnit Reporter

JUnit XML format for CI/CD integration.

Usage:

prompt-contracts run --report junit [--out junit.xml]

Features:

Standard JUnit XML format
Test case per check
Failure detail capture
CI/CD pipeline integration

Architecture

Project Structure

src/promptcontracts/
  cli.py                    # CLI entry points
  core/
    loader.py               # Artefact loading and schema validation
    validator.py            # Check registry and execution
    runner.py               # Contract orchestration
    checks/                 # Built-in check implementations
      json_valid.py
      json_required.py
      enum_value.py
      regex_absent.py
      token_budget.py
      latency_budget.py
    adapters/               # LLM provider adapters
      base.py
      openai_adapter.py
      ollama_adapter.py
    reporters/              # Output formatters
      cli_reporter.py
      json_reporter.py
      junit_reporter.py
  spec/                     # PCSL specification
    pcsl-v0.1.md
    schema/
      pcsl-pd.schema.json
      pcsl-es.schema.json
      pcsl-ep.schema.json
examples/                   # Example contracts
tests/                      # Test suite

Dependencies

Core:

pyyaml: YAML parsing
jsonschema: Schema validation
jsonpath-ng: JSONPath evaluation
httpx: HTTP client for Ollama
numpy: Statistical calculations

Provider SDKs:

openai: OpenAI API integration

CLI:

rich: Terminal formatting

Testing

Run Test Suite

# All tests
pytest tests/ -v

# Specific test module
pytest tests/test_enforcement.py -v

# With coverage
pytest tests/ --cov=promptcontracts --cov-report=html

Test Categories

Loader Tests: Schema validation, file parsing
Check Tests: Individual check logic
Enforcement Tests: Normalization, schema derivation, retries
Integration Tests: End-to-end contract execution

Current Coverage

17 tests passing
Core functionality: 100%
Enforcement features: 100%
Edge cases: Ongoing

Roadmap

Completed (v0.1)

PCSL specification v0.1 with JSON Schemas
Execution modes (observe, assist, enforce, auto)
Auto-repair and bounded retries
Schema-guided JSON (OpenAI structured outputs)
Artifact saving with full IO transparency
OpenAI and Ollama adapters
CLI, JSON, and JUnit reporters
Conformance levels L1-L3 (scaffold)

Planned (v0.2)

L3 Differential runner enhancements
- Statistical significance testing
- Drift detection algorithms
- A/B testing support
HTML reporter with visualization
- Trend charts
- Diff views
- Interactive filtering
Additional check types
- JSON Schema field validation
- Numeric range checks
- Cross-field dependencies
- String length validation

Planned (v0.3)

L4 Security conformance
- Jailbreak escape-rate metrics
- PII leakage detection
- Prompt injection testing
- Adversarial robustness
Additional adapters
- Anthropic Claude
- Google Gemini
- Azure OpenAI
- Hugging Face
Observability integration
- OpenTelemetry export
- Prometheus metrics
- Grafana dashboards

Planned (Future)

Multi-modal support (images, audio)
GitHub Action and GitLab CI templates
VS Code extension
Pre-commit hooks
Fine-tuning contract integration
Production monitoring integration

Contributing

Spec Governance

The PCSL specification lives under src/promptcontracts/spec/. Changes to the specification follow an RFC process:

Open a GitHub Issue describing the proposed change
Label as spec-rfc
Community discussion and feedback
Approval by maintainers
Implementation and documentation

Development Setup

# Clone repository
git clone https://github.com/promptcontracts/prompt-contracts.git
cd prompt-contracts

# Install development dependencies
pip install -r requirements.txt
pip install -e .

# Run tests
pytest tests/ -v

Contribution Guidelines

Follow existing code style and patterns
Add tests for new features
Update documentation
Ensure all tests pass
Write clear commit messages

Versioning

PCSL and prompt-contracts follow Semantic Versioning:

Patch (0.1.x): Bug fixes, clarifications
Minor (0.x.0): New features, backward-compatible additions
Major (x.0.0): Breaking changes to artefact structure or behavior

License

Code: MIT License
Documentation: CC-BY 4.0

See LICENSE file for details.

Support

Documentation: See QUICKSTART.md for getting started guide
Specification: Read src/promptcontracts/spec/pcsl-v0.1.md for detailed spec
Issues: Report bugs and request features via GitHub Issues
Discussions: Join community discussions on GitHub Discussions

Citation

If you use Prompt-Contracts in your research or production systems, please cite:

@software{promptcontracts2025,
  title = {Prompt-Contracts: Contract Testing for LLM Prompts},
  author = {Prompt-Contracts Contributors},
  year = {2025},
  url = {https://github.com/promptcontracts/prompt-contracts},
  version = {0.1.0}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.0

Oct 14, 2025

0.3.2

Oct 10, 2025

0.3.0

Oct 9, 2025

0.2.3

Oct 9, 2025

0.2.2

Oct 9, 2025

0.2.1

Oct 8, 2025

This version

0.2.0

Oct 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prompt_contracts-0.2.0.tar.gz (51.8 kB view details)

Uploaded Oct 8, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

prompt_contracts-0.2.0-py3-none-any.whl (43.0 kB view details)

Uploaded Oct 8, 2025 Python 3

File details

Details for the file prompt_contracts-0.2.0.tar.gz.

File metadata

Download URL: prompt_contracts-0.2.0.tar.gz
Upload date: Oct 8, 2025
Size: 51.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for prompt_contracts-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`be256a0c7db1ae760b925608a0ad595affb00ab3c1f35e5a821ec299f5c15d5a`
MD5	`c932c89ee7b33e7fd65293252ea3da5a`
BLAKE2b-256	`bbae24a8ca9793900daf3a330d34ee7ef0be502b4208b09c28a87da5051d163c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for prompt_contracts-0.2.0.tar.gz:

Publisher: publish-pypi.yml on philippmelikidis/prompt-contracts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: prompt_contracts-0.2.0.tar.gz
- Subject digest: be256a0c7db1ae760b925608a0ad595affb00ab3c1f35e5a821ec299f5c15d5a
- Sigstore transparency entry: 592502663
- Sigstore integration time: Oct 8, 2025
Source repository:
- Permalink: philippmelikidis/prompt-contracts@42784cfe1249f52a0375be20fabedd2c89bbab8a
- Branch / Tag: refs/heads/main
- Owner: https://github.com/philippmelikidis
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@42784cfe1249f52a0375be20fabedd2c89bbab8a
- Trigger Event: workflow_dispatch

File details

Details for the file prompt_contracts-0.2.0-py3-none-any.whl.

File metadata

Download URL: prompt_contracts-0.2.0-py3-none-any.whl
Upload date: Oct 8, 2025
Size: 43.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for prompt_contracts-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`abcbb2cafb0b29c4cc85768c45df6203d4c11647c552be97198dd16459ecd877`
MD5	`faccaa88fe5e03d9140ee33b61bb5631`
BLAKE2b-256	`147786d6d74069dc086461f96e019fe8bc720efe35703cf6b3c758cc1b17868d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for prompt_contracts-0.2.0-py3-none-any.whl:

Publisher: publish-pypi.yml on philippmelikidis/prompt-contracts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: prompt_contracts-0.2.0-py3-none-any.whl
- Subject digest: abcbb2cafb0b29c4cc85768c45df6203d4c11647c552be97198dd16459ecd877
- Sigstore transparency entry: 592502666
- Sigstore integration time: Oct 8, 2025
Source repository:
- Permalink: philippmelikidis/prompt-contracts@42784cfe1249f52a0375be20fabedd2c89bbab8a
- Branch / Tag: refs/heads/main
- Owner: https://github.com/philippmelikidis
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@42784cfe1249f52a0375be20fabedd2c89bbab8a
- Trigger Event: workflow_dispatch

prompt-contracts 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Prompt Contracts

Table of Contents

Overview

Common Problems Solved

Key Features

PCSL v0.1 Implementation

Quick Start

Prerequisites

Installation

Setup Ollama (Optional)

Run Example Contract

Core Concepts

Artefact Types

Prompt Definition (PD)

Expectation Suite (ES)

Evaluation Profile (EP)

Execution Modes

observe (Validation Only)

assist (Prompt Augmentation)

enforce (Schema-Guided JSON)

auto (Adaptive)

Status Codes

Per-Fixture Status

Per-Target Status

Installation

From Source

Verify Installation

Usage

CLI Commands

Validate Artefacts

Run Contract

Execution Configuration

Artifact Saving

PCSL Specification

Conformance Levels

L1 - Structural Conformance

L2 - Semantic Conformance

L3 - Differential Conformance

L4 - Security Conformance (Planned)

Built-in Checks

pc.check.json_valid

pc.check.json_required

pc.check.enum

pc.check.regex_absent

pc.check.token_budget

pc.check.latency_budget

Adapters

OpenAI Adapter

Ollama Adapter

Custom Adapters

Reporters

CLI Reporter

JSON Reporter

JUnit Reporter

Architecture

Project Structure

Dependencies

Testing

Run Test Suite

Test Categories

Current Coverage

Roadmap

Completed (v0.1)

Planned (v0.2)

Planned (v0.3)

Planned (Future)

Contributing

Spec Governance

Development Setup

Contribution Guidelines

Versioning