Contract testing framework for LLM prompts - PCSL (Prompt Contract Specification Language)

These details have not been verified by PyPI

Project links

Project description

Prompt Contracts

Test your LLM prompts like code.

Prompt-Contracts is a specification and toolkit that brings contract testing to LLM prompt interactions. When models drift due to provider updates, parameter changes, or switches to local models, integrations can silently break. This framework enables structural, semantic, and behavioral validation of LLM responses.

What's New in v0.3.2

Statistical Rigor & Fair Comparison Release:

Wilson/Jeffreys Intervals: Default CI methods with solid statistical foundations (n ≥ 10: Wilson; n < 10: Jeffreys)
McNemar Test: Paired binary comparison for system evaluations
Block Bootstrap: Handle dependencies from repairs or batching
Cross-Family Judge Validation: Bias-controlled semantic evaluation with κ reliability metrics
Fair Comparison Protocol: Standardized baseline comparisons (CheckList, Guidance, OpenAI Structured)
Repair Risk Analysis: Semantic change detection and sensitivity reporting
Audit Harness: Tamper-evident bundles with SHA-256 hashes and GPG signatures
Contract Composition: Formal semantics for sequential/parallel contract aggregation
Expanded Compliance: Enhanced ISO/EU AI Act mapping with statistical methods

See CHANGELOG.md for complete v0.3.2 details.

Overview
Key Features
Quick Start
Core Concepts
Examples
Installation
Usage
Best Practices
PCSL Specification
- Conformance Levels
- Built-in Checks
Adapters
Reporters
Architecture
Testing
Roadmap
Contributing
License

Overview

Prompt-Contracts implements the Prompt Contract Specification Language (PCSL), a formal specification for defining, validating, and enforcing LLM prompt behavior. Similar to how OpenAPI defines REST API contracts or JSON Schema defines data contracts, PCSL defines:

What a prompt expects as input
How the LLM should respond (structure, semantics, performance)
Where these expectations should hold (which models, providers, parameters)

Common Problems Solved

JSON Breakage: Responses become invalid or wrapped in markdown code fences
Missing Fields: Required fields disappear from structured outputs
Enum Drift: Values drift from expected enums ("urgent" instead of "high")
Performance Regression: Latency and token budgets exceed acceptable limits
Model Switching: Behavior changes when switching between providers or model versions

Key Features

PCSL v0.1 Implementation

Specification & Validation

Formal PCSL specification with JSON Schema validation
Three artefact types: Prompt Definition (PD), Expectation Suite (ES), Evaluation Profile (EP)
Progressive conformance levels (L1-L3)

Execution Modes

observe: Validation-only mode with no modifications
assist: Prompt augmentation with auto-generated constraints
enforce: Schema-guided JSON generation (OpenAI structured outputs)
auto: Adaptive mode with intelligent fallback chain

Auto-Repair & Retries

Bounded retry mechanism with configurable limits
Automatic output normalization (strip markdown fences, lowercase fields)
Detailed repair tracking and status reporting

Schema-Guided Enforcement

Automatic JSON Schema derivation from expectation suites
OpenAI structured output integration via response_format
Capability negotiation for provider-specific features

Full IO Transparency

Complete artifact saving with --save-io flag
Per-fixture storage of inputs, outputs, and metadata
Cryptographic prompt hashing for reproducibility
Timestamped execution traces

Multi-Provider Support

OpenAI adapter with schema enforcement capabilities
Ollama adapter for local model execution
Extensible adapter architecture

Comprehensive Reporting

CLI reporter with rich formatting
JSON reporter for machine-readable output
JUnit XML for CI/CD integration

Quick Start

Prerequisites

Python 3.10 or higher
Ollama (for local models) or OpenAI API key

Installation

From PyPI (recommended):

pip install prompt-contracts

From source (for development):

git clone https://github.com/philippmelikidis/prompt-contracts.git
cd prompt-contracts
pip install -e .

Setup Ollama (Optional)

# Install Ollama
brew install ollama

# Start server
ollama serve

# Pull model
ollama pull mistral

Run Example Contract

prompt-contracts run \
  --pd examples/support_ticket/pd.json \
  --es examples/support_ticket/es.json \
  --ep examples/support_ticket/ep.json \
  --report cli

Expected Output:

TARGET ollama:mistral
  mode: assist

Fixture: pwd_reset (latency: 2314ms, status: REPAIRED, retries: 0)
  Repairs applied: lowercased $.priority
  PASS | pc.check.json_valid
         Response is valid JSON
  PASS | pc.check.json_required
         All required fields present: ['category', 'priority', 'reason']
  PASS | pc.check.enum
         Value 'high' is in allowed values ['low', 'medium', 'high']
  ...

Summary: 11/11 checks passed (1 PASS, 1 REPAIRED) — status: YELLOW

Core Concepts

Artefact Types

Prompt Definition (PD)

Describes the canonical prompt and I/O expectations.

{
  "pcsl": "0.1.0",
  "id": "support.ticket.classify.v1",
  "io": {
    "channel": "text",
    "expects": "structured/json"
  },
  "prompt": "You are a support classifier. Reply ONLY with strict JSON."
}

Expectation Suite (ES)

Declares validation checks as properties that must hold for every execution.

{
  "pcsl": "0.1.0",
  "checks": [
    { "type": "pc.check.json_valid" },
    {
      "type": "pc.check.json_required",
      "fields": ["category", "priority", "reason"]
    },
    {
      "type": "pc.check.enum",
      "field": "$.priority",
      "allowed": ["low", "medium", "high"]
    },
    { "type": "pc.check.regex_absent", "pattern": "```" },
    { "type": "pc.check.token_budget", "max_out": 200 },
    { "type": "pc.check.latency_budget", "p95_ms": 5000 }
  ]
}

Evaluation Profile (EP)

Defines execution context: models, test fixtures, and tolerance thresholds.

{
  "pcsl": "0.1.0",
  "targets": [
    {
      "type": "ollama",
      "model": "mistral",
      "params": { "temperature": 0 }
    }
  ],
  "fixtures": [
    { "id": "pwd_reset", "input": "User: My password doesn't work." },
    { "id": "billing", "input": "User: I was double charged." }
  ],
  "execution": {
    "mode": "assist",
    "max_retries": 1,
    "auto_repair": {
      "lowercase_fields": ["$.priority"],
      "strip_markdown_fences": true
    }
  },
  "tolerances": {
    "pc.check.json_valid": { "max_fail_rate": 0.0 },
    "pc.check.enum": { "max_fail_rate": 0.01 }
  }
}

Execution Modes

Prompt-Contracts provides four execution modes with different strategies for ensuring LLM output quality:

observe (Validation Only)

Purpose: Pure validation without any modifications Behavior: No changes to prompt or output Status Codes: PASS or FAIL Use Case: Monitoring, testing, baseline measurements

{
  "execution": {
    "mode": "observe",
    "max_retries": 0
  }
}

Example:

prompt-contracts run \
  --pd examples/email_classification/pd.json \
  --es examples/email_classification/es.json \
  --ep examples/email_classification/ep_observe.json

assist (Prompt Augmentation)

Purpose: Automatic prompt enhancement with constraints Behavior: Adds auto-generated constraint blocks to prompt Status Codes: PASS, REPAIRED, or FAIL Use Case: Production systems with retry logic

The assist mode automatically enriches the prompt with structural requirements:

Original Prompt:

You are a support classifier. Reply with JSON containing category, priority, reason.

Augmented Prompt (automatic):

You are a support classifier. Reply with JSON containing category, priority, reason.

CONSTRAINTS:
- Response MUST be valid JSON
- Required fields: category, priority, reason
- Field "priority" MUST be one of: low, medium, high
- Do NOT use markdown code fences (```)

Configuration:

{
  "execution": {
    "mode": "assist",
    "max_retries": 2,
    "auto_repair": {
      "lowercase_fields": ["$.priority", "$.category"],
      "strip_markdown_fences": true
    }
  }
}

Auto-Repair Capabilities:

strip_markdown_fences: Removes ```json code fences from responses
lowercase_fields: Normalizes fields to lowercase (e.g., "High" → "high")

Example:

prompt-contracts run \
  --pd examples/email_classification/pd.json \
  --es examples/email_classification/es.json \
  --ep examples/email_classification/ep_assist.json \
  --save-io artifacts/

enforce (Schema-Guided JSON)

Purpose: Leverages provider capabilities for guaranteed JSON structure Behavior: Generates JSON Schema from ES and uses response_format (OpenAI) Status Codes: PASS, REPAIRED, FAIL, or NONENFORCEABLE Use Case: Maximum structural guarantee with supporting providers

The enforce mode uses native provider features like OpenAI's Structured Outputs:

Auto-generated JSON Schema:

{
  "type": "object",
  "properties": {
    "category": { "type": "string", "enum": ["business", "personal", "spam", "support", "marketing"] },
    "priority": { "type": "string", "enum": ["low", "medium", "high"] },
    "reason": { "type": "string" }
  },
  "required": ["category", "priority", "reason"],
  "additionalProperties": false
}

Configuration:

{
  "execution": {
    "mode": "enforce",
    "max_retries": 1,
    "strict_enforce": false
  }
}

Adapter Support:

✅ OpenAI: Full support via response_format
⚠️ Ollama: Falls back to assist (no schema enforcement)
⚠️ Others: Capability-based fallback

strict_enforce Flag:

false (default): Silent fallback to assist when schema not supported
true: Returns NONENFORCEABLE status instead of fallback

Example:

prompt-contracts run \
  --pd examples/email_classification/pd.json \
  --es examples/email_classification/es.json \
  --ep examples/email_classification/ep_enforce.json

auto (Adaptive)

Purpose: Intelligent mode selection based on capabilities Behavior: Fallback chain: enforce → assist → observe Status Codes: Depends on selected mode Use Case: Default mode for maximum compatibility

The auto mode automatically selects the best available mode:

Fallback Logic:

Checks adapter capabilities
If schema_guided_json=true → uses enforce
Otherwise → uses assist
On errors → fallback to observe

Configuration:

{
  "execution": {
    "mode": "auto",
    "max_retries": 2,
    "auto_repair": {
      "lowercase_fields": ["$.priority"],
      "strip_markdown_fences": true
    }
  }
}

Multi-Provider Example:

{
  "targets": [
    { "type": "openai", "model": "gpt-4o-mini" },
    { "type": "ollama", "model": "mistral" }
  ],
  "execution": { "mode": "auto" }
}

Result:

OpenAI → uses enforce (has schema_guided_json)
Ollama → uses assist (no schema_guided_json)

Example:

prompt-contracts run \
  --pd examples/email_classification/pd.json \
  --es examples/email_classification/es.json \
  --ep examples/email_classification/ep_auto.json \
  --report cli

Status Codes

Per-Fixture Status

PASS: Validation succeeded on first attempt
REPAIRED: Validation succeeded after auto-repair application
FAIL: Validation failed after exhausting all retries
NONENFORCEABLE: Enforcement requested but adapter lacks capability

Per-Target Status

GREEN: All fixtures passed without repairs
YELLOW: Some fixtures repaired or marked nonenforceable
RED: One or more fixtures failed validation

Examples

The repository contains several complete examples demonstrating various use cases and execution modes:

Support Ticket Classification

Directory: examples/support_ticket/ Use Case: Support request classification Mode: assist Provider: Ollama (Mistral)

prompt-contracts run \
  --pd examples/support_ticket/pd.json \
  --es examples/support_ticket/es.json \
  --ep examples/support_ticket/ep.json \
  --report cli

Email Classification

Directory: examples/email_classification/ Use Case: Email categorization with sentiment analysis Modes: All four modes (observe, assist, enforce, auto) Provider: Ollama / OpenAI

Testing with different modes:

# Observe Mode - Validation only
prompt-contracts run \
  --pd examples/email_classification/pd.json \
  --es examples/email_classification/es.json \
  --ep examples/email_classification/ep_observe.json

# Assist Mode - With prompt augmentation
prompt-contracts run \
  --pd examples/email_classification/pd.json \
  --es examples/email_classification/es.json \
  --ep examples/email_classification/ep_assist.json

# Enforce Mode - Schema-guided (OpenAI)
prompt-contracts run \
  --pd examples/email_classification/pd.json \
  --es examples/email_classification/es.json \
  --ep examples/email_classification/ep_enforce.json

# Auto Mode - Adaptive
prompt-contracts run \
  --pd examples/email_classification/pd.json \
  --es examples/email_classification/es.json \
  --ep examples/email_classification/ep_auto.json

Product Recommendation

Directory: examples/product_recommendation/ Use Case: Personalized product recommendations Mode: assist Provider: Ollama (Mistral)

prompt-contracts run \
  --pd examples/product_recommendation/pd.json \
  --es examples/product_recommendation/es.json \
  --ep examples/product_recommendation/ep.json \
  --save-io artifacts/product_recs/

Simple YAML Example

Directory: examples/simple_yaml/ Use Case: Minimal example in YAML format Format: YAML (converted to JSON)

Test Auto-Repair

Directory: examples/test_repair/ Use Case: Demonstrates auto-repair functionality Mode: assist with forced bad output Provider: Ollama (Mistral)

This example intentionally prompts the LLM to produce output that violates constraints (capitalized enums, markdown fences), then demonstrates how auto-repair fixes it:

prompt-contracts run \
  --pd examples/test_repair/pd_force_bad.json \
  --es examples/test_repair/es.json \
  --ep examples/test_repair/ep_assist_force.json \
  --save-io artifacts/repair_test \
  --verbose

Example Output:

TARGET ollama:mistral
  mode: assist

Fixture: password_issue (latency: 7909ms, status: REPAIRED, retries: 1)
  Repairs applied: stripped fences, lowercased $.category, $.priority
  ✓ PASS | pc.check.json_valid
         Response is valid JSON
  ✓ PASS | pc.check.json_required
         All required fields present: ['category', 'priority', 'reason']
  ✓ PASS | pc.check.enum
         Value 'technical' is in allowed values ['technical', 'billing', 'other']
  ✓ PASS | pc.check.enum
         Value 'high' is in allowed values ['low', 'medium', 'high']
  ✓ PASS | pc.check.regex_absent
         Pattern '```' not found (as expected)
  ✓ PASS | pc.check.token_budget
         Token count ~6 <= 200

============================================================
Summary: 6/6 checks passed (1 REPAIRED) — status: YELLOW
============================================================

📁 Artifacts saved to: artifacts/repair_test

What happened:

LLM produced: {"category": "Technical", "priority": "High", ...}
Auto-repair: stripped fences, lowercased fields
Final output: {"category": "technical", "priority": "high", ...}
Status: REPAIRED (all checks passed after repair)

Installation

From Source

git clone https://github.com/promptcontracts/prompt-contracts.git
cd prompt-contracts
pip install -r requirements.txt
pip install -e .

Verify Installation

prompt-contracts --help

Usage

CLI Commands

Validate Artefacts

Validate artefacts against PCSL schemas:

prompt-contracts validate pd examples/support_ticket/pd.json
prompt-contracts validate es examples/support_ticket/es.json
prompt-contracts validate ep examples/support_ticket/ep.json

Run Contract

Execute a complete contract with validation:

prompt-contracts run \
  --pd <path-to-pd> \
  --es <path-to-es> \
  --ep <path-to-ep> \
  [--report cli|json|junit] \
  [--out <output-path>] \
  [--save-io <artifacts-directory>] \
  [-v|--verbose]

Arguments:

--pd: Path to Prompt Definition (JSON/YAML, required)
--es: Path to Expectation Suite (JSON/YAML, required)
--ep: Path to Evaluation Profile (JSON/YAML, required)
--report: Report format - cli (default), json, or junit
--out: Output path for report file (optional)
--save-io: Directory to save execution artifacts (input_final.txt, output_raw.txt, output_norm.txt, run.json)
-v, --verbose: Enable verbose output

Exit Codes:

0: All fixtures passed or were repaired successfully
1: One or more fixtures failed or marked NONENFORCEABLE
2: PD/ES/EP validation error (schema mismatch)
3: Runtime/adapter error

Example with artifacts:

prompt-contracts run \
  --pd examples/support_ticket/pd.json \
  --es examples/support_ticket/es.json \
  --ep examples/support_ticket/ep.json \
  --save-io artifacts/ \
  --report json --out results.json \
  --verbose

Execution Configuration

Configure execution behavior in the Evaluation Profile:

{
  "execution": {
    "mode": "assist",
    "max_retries": 1,
    "auto_repair": {
      "lowercase_fields": ["$.priority", "$.status"],
      "strip_markdown_fences": true
    }
  }
}

Configuration Options:

mode: Execution mode (auto, enforce, assist, observe)
max_retries: Maximum retry attempts on validation failure (default: 1)
auto_repair.lowercase_fields: JSONPath fields to lowercase
auto_repair.strip_markdown_fences: Remove code fence markers (default: true)

Artifact Saving

Enable comprehensive artifact saving with --save-io:

prompt-contracts run \
  --pd pd.json --es es.json --ep ep.json \
  --save-io artifacts/

Directory Structure:

artifacts/
  <target-id>/
    <fixture-id>/
      input_final.txt      # Final prompt with augmentations
      output_raw.txt       # Raw model response
      output_norm.txt      # Normalized output after auto-repair
      run.json             # Complete execution metadata

run.json Contents:

{
  "pcsl": "0.1.0",
  "target": "ollama:mistral",
  "params": { "temperature": 0 },
  "execution": {
    "mode": "assist",
    "effective_mode": "assist",
    "max_retries": 1
  },
  "latency_ms": 2314,
  "retries_used": 0,
  "status": "REPAIRED",
  "repaired_details": {
    "stripped_fences": true,
    "lowercased_fields": ["$.priority"]
  },
  "checks": [...],
  "prompt_hash": "a1b2c3...",
  "timestamp": "2025-10-07T12:34:56Z"
}

PCSL Specification

Conformance Levels

PCSL defines progressive conformance levels:

L1 - Structural Conformance

JSON validity validation
Required field presence checking
Token budget enforcement
Basic structural guarantees

L2 - Semantic Conformance

Includes L1 plus:

Enum value validation with JSONPath
Regex pattern assertions (presence/absence)
Advanced field-level checks
Semantic property validation

L3 - Differential Conformance

Includes L2 plus:

Multi-target execution and comparison
Pass-rate validation across models
Latency budget enforcement (p95)
Tolerance-based acceptance criteria

L4 - Security Conformance (Planned)

Includes L3 plus:

Jailbreak escape-rate metrics
PII leakage detection
Adversarial robustness testing
Security property validation

Built-in Checks

pc.check.json_valid

Validates response is parseable JSON.

Parameters: None

{ "type": "pc.check.json_valid" }

pc.check.json_required

Validates presence of required fields at root level.

Parameters:

fields (array): Required field names

{
  "type": "pc.check.json_required",
  "fields": ["category", "priority", "reason"]
}

pc.check.enum

Validates field value against allowed enumeration.

Parameters:

field (string): JSONPath to field
allowed (array): Allowed values
case_insensitive (boolean, optional): Case-insensitive comparison

{
  "type": "pc.check.enum",
  "field": "$.priority",
  "allowed": ["low", "medium", "high"],
  "case_insensitive": false
}

pc.check.regex_absent

Validates regex pattern is NOT present in response.

Parameters:

pattern (string): Regex pattern

{ "type": "pc.check.regex_absent", "pattern": "```" }

pc.check.token_budget

Validates response length stays within token budget.

Parameters:

max_out (integer): Maximum output tokens

{ "type": "pc.check.token_budget", "max_out": 200 }

Note: Current implementation approximates tokens by word count.

pc.check.latency_budget

Validates p95 latency across all fixtures.

Parameters:

p95_ms (integer): p95 latency threshold in milliseconds

{ "type": "pc.check.latency_budget", "p95_ms": 5000 }

Adapters

Provider Support Matrix

Provider	Schema Enforcement	Mode Support	Status
✅ OpenAI	Full support via `response_format`	All modes (observe, assist, enforce, auto)	Production-ready
⚠️ Ollama	Falls back to assist (no schema enforcement)	observe, assist, auto	Recommended for local models
⚠️ Others	Use capability-based fallback adapters	observe, assist, auto	Extensible via custom adapters

Important: Assist fallback is the recommended mode for providers without schema enforcement (like Ollama and most local models). This is not an error — assist mode adds intelligent constraints to prompts and applies auto-repair, providing robust output validation without requiring native schema support.

OpenAI Adapter

Uses OpenAI SDK with full schema enforcement support.

Capabilities:

schema_guided_json: True (via response_format)
tool_calling: True
function_call_json: False

Features:

Structured output via response_format with JSON Schema
Enables enforce mode for guaranteed structure
Automatic fallback to assist when enforce unavailable
Parameter support: temperature, max_tokens

Configuration:

{
  "type": "openai",
  "model": "gpt-4o-mini",
  "params": {
    "temperature": 0,
    "max_tokens": 500
  }
}

Ollama Adapter

Supports local model execution via Ollama API.

Capabilities:

schema_guided_json: False
tool_calling: False
function_call_json: False

Features:

Local model execution (privacy-first, cost-effective)
HTTP API integration
Automatically uses assist mode with constraint augmentation
Auto-repair handles common issues (markdown fences, casing)
Parameter support: temperature

Configuration:

{
  "type": "ollama",
  "model": "mistral",
  "params": {
    "temperature": 0
  }
}

Note: Ollama works best with mode: assist or mode: auto in your EP. The framework will automatically add constraints to prompts and apply normalization to ensure reliable structured outputs.

Custom Adapters

Implement custom adapters by subclassing AbstractAdapter:

from promptcontracts.core.adapters import AbstractAdapter, Capability

class CustomAdapter(AbstractAdapter):
    def capabilities(self) -> Capability:
        return Capability(
            schema_guided_json=True,
            tool_calling=False,
            function_call_json=False
        )

    def generate(self, prompt: str, schema=None):
        # Implementation
        return response_text, latency_ms

Reporters

CLI Reporter

Rich-formatted terminal output with color coding and hierarchical structure.

Usage:

prompt-contracts run --report cli [--out output.txt]

Features:

Color-coded status indicators
Hierarchical fixture/check display
Repair detail tracking
Artifact path display
Summary statistics

JSON Reporter

Machine-readable JSON output for programmatic consumption.

Usage:

prompt-contracts run --report json [--out results.json]

Features:

Complete result serialization
Artifact path inclusion
Metadata enrichment
Timestamping

JUnit Reporter

JUnit XML format for CI/CD integration.

Usage:

prompt-contracts run --report junit [--out junit.xml]

Features:

Standard JUnit XML format
Test case per check
Failure detail capture
CI/CD pipeline integration

Architecture

Project Structure

src/promptcontracts/
  cli.py                    # CLI entry points
  core/
    loader.py               # Artefact loading and schema validation
    validator.py            # Check registry and execution
    runner.py               # Contract orchestration
    checks/                 # Built-in check implementations
      json_valid.py
      json_required.py
      enum_value.py
      regex_absent.py
      token_budget.py
      latency_budget.py
    adapters/               # LLM provider adapters
      base.py
      openai_adapter.py
      ollama_adapter.py
    reporters/              # Output formatters
      cli_reporter.py
      json_reporter.py
      junit_reporter.py
  spec/                     # PCSL specification
    pcsl-v0.1.md
    schema/
      pcsl-pd.schema.json
      pcsl-es.schema.json
      pcsl-ep.schema.json
examples/                   # Example contracts
tests/                      # Test suite

Dependencies

Core:

pyyaml: YAML parsing
jsonschema: Schema validation
jsonpath-ng: JSONPath evaluation
httpx: HTTP client for Ollama
numpy: Statistical calculations

Provider SDKs:

openai: OpenAI API integration

CLI:

rich: Terminal formatting

Testing

Run Test Suite

# All tests
pytest tests/ -v

# Specific test module
pytest tests/test_enforcement.py -v

# With coverage
pytest tests/ --cov=promptcontracts --cov-report=html

Test Categories

Loader Tests: Schema validation, file parsing
Check Tests: Individual check logic
Enforcement Tests: Normalization, schema derivation, retries
Integration Tests: End-to-end contract execution

Current Coverage

17 tests passing
Core functionality: 100%
Enforcement features: 100%
Edge cases: Ongoing

Roadmap

Completed (v0.1)

PCSL specification v0.1 with JSON Schemas
Execution modes (observe, assist, enforce, auto)
Auto-repair and bounded retries
Schema-guided JSON (OpenAI structured outputs)
Artifact saving with full IO transparency
OpenAI and Ollama adapters
CLI, JSON, and JUnit reporters
Conformance levels L1-L3 (scaffold)

Planned (v0.2)

L3 Differential runner enhancements
- Statistical significance testing
- Drift detection algorithms
- A/B testing support
HTML reporter with visualization
- Trend charts
- Diff views
- Interactive filtering
Additional check types
- JSON Schema field validation
- Numeric range checks
- Cross-field dependencies
- String length validation

Planned (v0.3)

L4 Security conformance
- Jailbreak escape-rate metrics
- PII leakage detection
- Prompt injection testing
- Adversarial robustness
Additional adapters
- Anthropic Claude
- Google Gemini
- Azure OpenAI
- Hugging Face
Observability integration
- OpenTelemetry export
- Prometheus metrics
- Grafana dashboards

Planned (Future)

Multi-modal support (images, audio)
GitHub Action and GitLab CI templates
VS Code extension
Pre-commit hooks
Fine-tuning contract integration
Production monitoring integration

Contributing

Spec Governance

The PCSL specification lives under src/promptcontracts/spec/. Changes to the specification follow an RFC process:

Open a GitHub Issue describing the proposed change
Label as spec-rfc
Community discussion and feedback
Approval by maintainers
Implementation and documentation

Development Setup

# Clone repository
git clone https://github.com/promptcontracts/prompt-contracts.git
cd prompt-contracts

# Install development dependencies
pip install -r requirements.txt
pip install -e .

# Run tests
pytest tests/ -v

Contribution Guidelines

Follow existing code style and patterns
Add tests for new features
Update documentation
Ensure all tests pass
Write clear commit messages

Versioning

PCSL and prompt-contracts follow Semantic Versioning:

Patch (0.1.x): Bug fixes, clarifications
Minor (0.x.0): New features, backward-compatible additions
Major (x.0.0): Breaking changes to artefact structure or behavior

License

Code: MIT License Documentation: CC-BY 4.0

See LICENSE file for details.

Support

Documentation: See QUICKSTART.md for getting started guide
Best Practices: Read BEST_PRACTICES.md for production guidance
Troubleshooting: Check TROUBLESHOOTING.md for common issues and solutions
Specification: Read src/promptcontracts/spec/pcsl-v0.1.md for detailed spec
Examples: Explore examples/ for real-world use cases
Issues: Report bugs and request features via GitHub Issues
Discussions: Join community discussions on GitHub Discussions

Citation

If you use Prompt-Contracts in your research or production systems, please cite:

@software{promptcontracts2025,
  title = {Prompt-Contracts: Contract Testing for LLM Prompts},
  author = {Prompt-Contracts Contributors},
  year = {2025},
  url = {https://github.com/promptcontracts/prompt-contracts},
  version = {0.1.0}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.0

Oct 14, 2025

This version

0.3.2

Oct 10, 2025

0.3.0

Oct 9, 2025

0.2.3

Oct 9, 2025

0.2.2

Oct 9, 2025

0.2.1

Oct 8, 2025

0.2.0

Oct 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prompt_contracts-0.3.2.tar.gz (101.5 kB view details)

Uploaded Oct 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

prompt_contracts-0.3.2-py3-none-any.whl (85.4 kB view details)

Uploaded Oct 10, 2025 Python 3

File details

Details for the file prompt_contracts-0.3.2.tar.gz.

File metadata

Download URL: prompt_contracts-0.3.2.tar.gz
Upload date: Oct 10, 2025
Size: 101.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for prompt_contracts-0.3.2.tar.gz
Algorithm	Hash digest
SHA256	`4787ed528d705b2fe2088a52ef76133b6bc4a37c478f92647a7a802040083e7f`
MD5	`9c4f6ff7fcccfa0fbd87c94d39e48598`
BLAKE2b-256	`8ec94606884b6c7d3f9660b4bca5f48f08e1b3ddfef579a2a0a9051d4d9713d9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for prompt_contracts-0.3.2.tar.gz:

Publisher: publish-pypi.yml on philippmelikidis/prompt-contracts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: prompt_contracts-0.3.2.tar.gz
- Subject digest: 4787ed528d705b2fe2088a52ef76133b6bc4a37c478f92647a7a802040083e7f
- Sigstore transparency entry: 598568717
- Sigstore integration time: Oct 10, 2025
Source repository:
- Permalink: philippmelikidis/prompt-contracts@a2c72c26235eeeb857f30e2ef8b3b14088327c54
- Branch / Tag: refs/tags/v0.3.2
- Owner: https://github.com/philippmelikidis
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@a2c72c26235eeeb857f30e2ef8b3b14088327c54
- Trigger Event: release

File details

Details for the file prompt_contracts-0.3.2-py3-none-any.whl.

File metadata

Download URL: prompt_contracts-0.3.2-py3-none-any.whl
Upload date: Oct 10, 2025
Size: 85.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for prompt_contracts-0.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fd14b1ac79eec047299deac90d72f74596e00a4b9933de47db05fe042fce1fb1`
MD5	`db5dc3b52d4df42abd7c414cd5b4c990`
BLAKE2b-256	`e3b1df8f324124b9b823b4164a631c9c0eb3429179b26c79dbe24c4e003f866b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for prompt_contracts-0.3.2-py3-none-any.whl:

Publisher: publish-pypi.yml on philippmelikidis/prompt-contracts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: prompt_contracts-0.3.2-py3-none-any.whl
- Subject digest: fd14b1ac79eec047299deac90d72f74596e00a4b9933de47db05fe042fce1fb1
- Sigstore transparency entry: 598568718
- Sigstore integration time: Oct 10, 2025
Source repository:
- Permalink: philippmelikidis/prompt-contracts@a2c72c26235eeeb857f30e2ef8b3b14088327c54
- Branch / Tag: refs/tags/v0.3.2
- Owner: https://github.com/philippmelikidis
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@a2c72c26235eeeb857f30e2ef8b3b14088327c54
- Trigger Event: release

prompt-contracts 0.3.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Prompt Contracts

What's New in v0.3.2

Table of Contents

Overview

Common Problems Solved

Key Features

PCSL v0.1 Implementation

Quick Start

Prerequisites

Installation

Setup Ollama (Optional)

Run Example Contract

Core Concepts

Artefact Types

Prompt Definition (PD)

Expectation Suite (ES)

Evaluation Profile (EP)

Execution Modes

observe (Validation Only)

assist (Prompt Augmentation)

enforce (Schema-Guided JSON)

auto (Adaptive)

Status Codes

Per-Fixture Status

Per-Target Status

Examples

Support Ticket Classification

Email Classification

Product Recommendation

Simple YAML Example

Test Auto-Repair

Installation

From Source

Verify Installation

Usage

CLI Commands

Validate Artefacts

Run Contract

Execution Configuration

Artifact Saving

PCSL Specification

Conformance Levels

L1 - Structural Conformance

L2 - Semantic Conformance

L3 - Differential Conformance

L4 - Security Conformance (Planned)

Built-in Checks

pc.check.json_valid

pc.check.json_required

pc.check.enum

pc.check.regex_absent

pc.check.token_budget

pc.check.latency_budget

Adapters

Provider Support Matrix

OpenAI Adapter

Ollama Adapter

Custom Adapters

Reporters

CLI Reporter

JSON Reporter

JUnit Reporter

Architecture

Project Structure

Dependencies

Testing

Run Test Suite

Test Categories

Current Coverage

Roadmap

Completed (v0.1)