Skip to main content

Contract testing framework for LLM prompts - PCSL (Prompt Contract Specification Language)

Project description

Prompt Contracts

CI PyPI version Python 3.10+ License: MIT Code style: black

Test your LLM prompts like code.

Prompt-Contracts is a specification and toolkit that brings contract testing to LLM prompt interactions. When models drift due to provider updates, parameter changes, or switches to local models, integrations can silently break. This framework enables structural, semantic, and behavioral validation of LLM responses.


What's New in v0.3.2

Statistical Rigor & Fair Comparison Release:

  • Wilson/Jeffreys Intervals: Default CI methods with solid statistical foundations (n ≥ 10: Wilson; n < 10: Jeffreys)
  • McNemar Test: Paired binary comparison for system evaluations
  • Block Bootstrap: Handle dependencies from repairs or batching
  • Cross-Family Judge Validation: Bias-controlled semantic evaluation with κ reliability metrics
  • Fair Comparison Protocol: Standardized baseline comparisons (CheckList, Guidance, OpenAI Structured)
  • Repair Risk Analysis: Semantic change detection and sensitivity reporting
  • Audit Harness: Tamper-evident bundles with SHA-256 hashes and GPG signatures
  • Contract Composition: Formal semantics for sequential/parallel contract aggregation
  • Expanded Compliance: Enhanced ISO/EU AI Act mapping with statistical methods

See CHANGELOG.md for complete v0.3.2 details.


Table of Contents


Overview

Prompt-Contracts implements the Prompt Contract Specification Language (PCSL), a formal specification for defining, validating, and enforcing LLM prompt behavior. Similar to how OpenAPI defines REST API contracts or JSON Schema defines data contracts, PCSL defines:

  • What a prompt expects as input
  • How the LLM should respond (structure, semantics, performance)
  • Where these expectations should hold (which models, providers, parameters)

Common Problems Solved

  • JSON Breakage: Responses become invalid or wrapped in markdown code fences
  • Missing Fields: Required fields disappear from structured outputs
  • Enum Drift: Values drift from expected enums ("urgent" instead of "high")
  • Performance Regression: Latency and token budgets exceed acceptable limits
  • Model Switching: Behavior changes when switching between providers or model versions

Key Features

PCSL v0.1 Implementation

Specification & Validation

  • Formal PCSL specification with JSON Schema validation
  • Three artefact types: Prompt Definition (PD), Expectation Suite (ES), Evaluation Profile (EP)
  • Progressive conformance levels (L1-L3)

Execution Modes

  • observe: Validation-only mode with no modifications
  • assist: Prompt augmentation with auto-generated constraints
  • enforce: Schema-guided JSON generation (OpenAI structured outputs)
  • auto: Adaptive mode with intelligent fallback chain

Auto-Repair & Retries

  • Bounded retry mechanism with configurable limits
  • Automatic output normalization (strip markdown fences, lowercase fields)
  • Detailed repair tracking and status reporting

Schema-Guided Enforcement

  • Automatic JSON Schema derivation from expectation suites
  • OpenAI structured output integration via response_format
  • Capability negotiation for provider-specific features

Full IO Transparency

  • Complete artifact saving with --save-io flag
  • Per-fixture storage of inputs, outputs, and metadata
  • Cryptographic prompt hashing for reproducibility
  • Timestamped execution traces

Multi-Provider Support

  • OpenAI adapter with schema enforcement capabilities
  • Ollama adapter for local model execution
  • Extensible adapter architecture

Comprehensive Reporting

  • CLI reporter with rich formatting
  • JSON reporter for machine-readable output
  • JUnit XML for CI/CD integration

Quick Start

Prerequisites

  • Python 3.10 or higher
  • Ollama (for local models) or OpenAI API key

Installation

From PyPI (recommended):

pip install prompt-contracts

From source (for development):

git clone https://github.com/philippmelikidis/prompt-contracts.git
cd prompt-contracts
pip install -e .

Setup Ollama (Optional)

# Install Ollama
brew install ollama

# Start server
ollama serve

# Pull model
ollama pull mistral

Run Example Contract

prompt-contracts run \
  --pd examples/support_ticket/pd.json \
  --es examples/support_ticket/es.json \
  --ep examples/support_ticket/ep.json \
  --report cli

Expected Output:

TARGET ollama:mistral
  mode: assist

Fixture: pwd_reset (latency: 2314ms, status: REPAIRED, retries: 0)
  Repairs applied: lowercased $.priority
  PASS | pc.check.json_valid
         Response is valid JSON
  PASS | pc.check.json_required
         All required fields present: ['category', 'priority', 'reason']
  PASS | pc.check.enum
         Value 'high' is in allowed values ['low', 'medium', 'high']
  ...

Summary: 11/11 checks passed (1 PASS, 1 REPAIRED) — status: YELLOW

Core Concepts

Artefact Types

Prompt Definition (PD)

Describes the canonical prompt and I/O expectations.

{
  "pcsl": "0.1.0",
  "id": "support.ticket.classify.v1",
  "io": {
    "channel": "text",
    "expects": "structured/json"
  },
  "prompt": "You are a support classifier. Reply ONLY with strict JSON."
}

Expectation Suite (ES)

Declares validation checks as properties that must hold for every execution.

{
  "pcsl": "0.1.0",
  "checks": [
    { "type": "pc.check.json_valid" },
    {
      "type": "pc.check.json_required",
      "fields": ["category", "priority", "reason"]
    },
    {
      "type": "pc.check.enum",
      "field": "$.priority",
      "allowed": ["low", "medium", "high"]
    },
    { "type": "pc.check.regex_absent", "pattern": "```" },
    { "type": "pc.check.token_budget", "max_out": 200 },
    { "type": "pc.check.latency_budget", "p95_ms": 5000 }
  ]
}

Evaluation Profile (EP)

Defines execution context: models, test fixtures, and tolerance thresholds.

{
  "pcsl": "0.1.0",
  "targets": [
    {
      "type": "ollama",
      "model": "mistral",
      "params": { "temperature": 0 }
    }
  ],
  "fixtures": [
    { "id": "pwd_reset", "input": "User: My password doesn't work." },
    { "id": "billing", "input": "User: I was double charged." }
  ],
  "execution": {
    "mode": "assist",
    "max_retries": 1,
    "auto_repair": {
      "lowercase_fields": ["$.priority"],
      "strip_markdown_fences": true
    }
  },
  "tolerances": {
    "pc.check.json_valid": { "max_fail_rate": 0.0 },
    "pc.check.enum": { "max_fail_rate": 0.01 }
  }
}

Execution Modes

Prompt-Contracts provides four execution modes with different strategies for ensuring LLM output quality:

observe (Validation Only)

Purpose: Pure validation without any modifications Behavior: No changes to prompt or output Status Codes: PASS or FAIL Use Case: Monitoring, testing, baseline measurements

{
  "execution": {
    "mode": "observe",
    "max_retries": 0
  }
}

Example:

prompt-contracts run \
  --pd examples/email_classification/pd.json \
  --es examples/email_classification/es.json \
  --ep examples/email_classification/ep_observe.json

assist (Prompt Augmentation)

Purpose: Automatic prompt enhancement with constraints Behavior: Adds auto-generated constraint blocks to prompt Status Codes: PASS, REPAIRED, or FAIL Use Case: Production systems with retry logic

The assist mode automatically enriches the prompt with structural requirements:

Original Prompt:

You are a support classifier. Reply with JSON containing category, priority, reason.

Augmented Prompt (automatic):

You are a support classifier. Reply with JSON containing category, priority, reason.

CONSTRAINTS:
- Response MUST be valid JSON
- Required fields: category, priority, reason
- Field "priority" MUST be one of: low, medium, high
- Do NOT use markdown code fences (```)

Configuration:

{
  "execution": {
    "mode": "assist",
    "max_retries": 2,
    "auto_repair": {
      "lowercase_fields": ["$.priority", "$.category"],
      "strip_markdown_fences": true
    }
  }
}

Auto-Repair Capabilities:

  • strip_markdown_fences: Removes ```json code fences from responses
  • lowercase_fields: Normalizes fields to lowercase (e.g., "High" → "high")

Example:

prompt-contracts run \
  --pd examples/email_classification/pd.json \
  --es examples/email_classification/es.json \
  --ep examples/email_classification/ep_assist.json \
  --save-io artifacts/

enforce (Schema-Guided JSON)

Purpose: Leverages provider capabilities for guaranteed JSON structure Behavior: Generates JSON Schema from ES and uses response_format (OpenAI) Status Codes: PASS, REPAIRED, FAIL, or NONENFORCEABLE Use Case: Maximum structural guarantee with supporting providers

The enforce mode uses native provider features like OpenAI's Structured Outputs:

Auto-generated JSON Schema:

{
  "type": "object",
  "properties": {
    "category": { "type": "string", "enum": ["business", "personal", "spam", "support", "marketing"] },
    "priority": { "type": "string", "enum": ["low", "medium", "high"] },
    "reason": { "type": "string" }
  },
  "required": ["category", "priority", "reason"],
  "additionalProperties": false
}

Configuration:

{
  "execution": {
    "mode": "enforce",
    "max_retries": 1,
    "strict_enforce": false
  }
}

Adapter Support:

  • OpenAI: Full support via response_format
  • ⚠️ Ollama: Falls back to assist (no schema enforcement)
  • ⚠️ Others: Capability-based fallback

strict_enforce Flag:

  • false (default): Silent fallback to assist when schema not supported
  • true: Returns NONENFORCEABLE status instead of fallback

Example:

prompt-contracts run \
  --pd examples/email_classification/pd.json \
  --es examples/email_classification/es.json \
  --ep examples/email_classification/ep_enforce.json

auto (Adaptive)

Purpose: Intelligent mode selection based on capabilities Behavior: Fallback chain: enforce → assist → observe Status Codes: Depends on selected mode Use Case: Default mode for maximum compatibility

The auto mode automatically selects the best available mode:

Fallback Logic:

  1. Checks adapter capabilities
  2. If schema_guided_json=true → uses enforce
  3. Otherwise → uses assist
  4. On errors → fallback to observe

Configuration:

{
  "execution": {
    "mode": "auto",
    "max_retries": 2,
    "auto_repair": {
      "lowercase_fields": ["$.priority"],
      "strip_markdown_fences": true
    }
  }
}

Multi-Provider Example:

{
  "targets": [
    { "type": "openai", "model": "gpt-4o-mini" },
    { "type": "ollama", "model": "mistral" }
  ],
  "execution": { "mode": "auto" }
}

Result:

  • OpenAI → uses enforce (has schema_guided_json)
  • Ollama → uses assist (no schema_guided_json)

Example:

prompt-contracts run \
  --pd examples/email_classification/pd.json \
  --es examples/email_classification/es.json \
  --ep examples/email_classification/ep_auto.json \
  --report cli

Status Codes

Per-Fixture Status

  • PASS: Validation succeeded on first attempt
  • REPAIRED: Validation succeeded after auto-repair application
  • FAIL: Validation failed after exhausting all retries
  • NONENFORCEABLE: Enforcement requested but adapter lacks capability

Per-Target Status

  • GREEN: All fixtures passed without repairs
  • YELLOW: Some fixtures repaired or marked nonenforceable
  • RED: One or more fixtures failed validation

Examples

The repository contains several complete examples demonstrating various use cases and execution modes:

Support Ticket Classification

Directory: examples/support_ticket/ Use Case: Support request classification Mode: assist Provider: Ollama (Mistral)

prompt-contracts run \
  --pd examples/support_ticket/pd.json \
  --es examples/support_ticket/es.json \
  --ep examples/support_ticket/ep.json \
  --report cli

Email Classification

Directory: examples/email_classification/ Use Case: Email categorization with sentiment analysis Modes: All four modes (observe, assist, enforce, auto) Provider: Ollama / OpenAI

Testing with different modes:

# Observe Mode - Validation only
prompt-contracts run \
  --pd examples/email_classification/pd.json \
  --es examples/email_classification/es.json \
  --ep examples/email_classification/ep_observe.json

# Assist Mode - With prompt augmentation
prompt-contracts run \
  --pd examples/email_classification/pd.json \
  --es examples/email_classification/es.json \
  --ep examples/email_classification/ep_assist.json

# Enforce Mode - Schema-guided (OpenAI)
prompt-contracts run \
  --pd examples/email_classification/pd.json \
  --es examples/email_classification/es.json \
  --ep examples/email_classification/ep_enforce.json

# Auto Mode - Adaptive
prompt-contracts run \
  --pd examples/email_classification/pd.json \
  --es examples/email_classification/es.json \
  --ep examples/email_classification/ep_auto.json

Product Recommendation

Directory: examples/product_recommendation/ Use Case: Personalized product recommendations Mode: assist Provider: Ollama (Mistral)

prompt-contracts run \
  --pd examples/product_recommendation/pd.json \
  --es examples/product_recommendation/es.json \
  --ep examples/product_recommendation/ep.json \
  --save-io artifacts/product_recs/

Simple YAML Example

Directory: examples/simple_yaml/ Use Case: Minimal example in YAML format Format: YAML (converted to JSON)

Test Auto-Repair

Directory: examples/test_repair/ Use Case: Demonstrates auto-repair functionality Mode: assist with forced bad output Provider: Ollama (Mistral)

This example intentionally prompts the LLM to produce output that violates constraints (capitalized enums, markdown fences), then demonstrates how auto-repair fixes it:

prompt-contracts run \
  --pd examples/test_repair/pd_force_bad.json \
  --es examples/test_repair/es.json \
  --ep examples/test_repair/ep_assist_force.json \
  --save-io artifacts/repair_test \
  --verbose

Example Output:

TARGET ollama:mistral
  mode: assist

Fixture: password_issue (latency: 7909ms, status: REPAIRED, retries: 1)
  Repairs applied: stripped fences, lowercased $.category, $.priority
  ✓ PASS | pc.check.json_valid
         Response is valid JSON
  ✓ PASS | pc.check.json_required
         All required fields present: ['category', 'priority', 'reason']
  ✓ PASS | pc.check.enum
         Value 'technical' is in allowed values ['technical', 'billing', 'other']
  ✓ PASS | pc.check.enum
         Value 'high' is in allowed values ['low', 'medium', 'high']
  ✓ PASS | pc.check.regex_absent
         Pattern '```' not found (as expected)
  ✓ PASS | pc.check.token_budget
         Token count ~6 <= 200

============================================================
Summary: 6/6 checks passed (1 REPAIRED) — status: YELLOW
============================================================

📁 Artifacts saved to: artifacts/repair_test

What happened:

  • LLM produced: {"category": "Technical", "priority": "High", ...}
  • Auto-repair: stripped fences, lowercased fields
  • Final output: {"category": "technical", "priority": "high", ...}
  • Status: REPAIRED (all checks passed after repair)

Installation

From Source

git clone https://github.com/promptcontracts/prompt-contracts.git
cd prompt-contracts
pip install -r requirements.txt
pip install -e .

Verify Installation

prompt-contracts --help

Usage

CLI Commands

Validate Artefacts

Validate artefacts against PCSL schemas:

prompt-contracts validate pd examples/support_ticket/pd.json
prompt-contracts validate es examples/support_ticket/es.json
prompt-contracts validate ep examples/support_ticket/ep.json

Run Contract

Execute a complete contract with validation:

prompt-contracts run \
  --pd <path-to-pd> \
  --es <path-to-es> \
  --ep <path-to-ep> \
  [--report cli|json|junit] \
  [--out <output-path>] \
  [--save-io <artifacts-directory>] \
  [-v|--verbose]

Arguments:

  • --pd: Path to Prompt Definition (JSON/YAML, required)
  • --es: Path to Expectation Suite (JSON/YAML, required)
  • --ep: Path to Evaluation Profile (JSON/YAML, required)
  • --report: Report format - cli (default), json, or junit
  • --out: Output path for report file (optional)
  • --save-io: Directory to save execution artifacts (input_final.txt, output_raw.txt, output_norm.txt, run.json)
  • -v, --verbose: Enable verbose output

Exit Codes:

  • 0: All fixtures passed or were repaired successfully
  • 1: One or more fixtures failed or marked NONENFORCEABLE
  • 2: PD/ES/EP validation error (schema mismatch)
  • 3: Runtime/adapter error

Example with artifacts:

prompt-contracts run \
  --pd examples/support_ticket/pd.json \
  --es examples/support_ticket/es.json \
  --ep examples/support_ticket/ep.json \
  --save-io artifacts/ \
  --report json --out results.json \
  --verbose

Execution Configuration

Configure execution behavior in the Evaluation Profile:

{
  "execution": {
    "mode": "assist",
    "max_retries": 1,
    "auto_repair": {
      "lowercase_fields": ["$.priority", "$.status"],
      "strip_markdown_fences": true
    }
  }
}

Configuration Options:

  • mode: Execution mode (auto, enforce, assist, observe)
  • max_retries: Maximum retry attempts on validation failure (default: 1)
  • auto_repair.lowercase_fields: JSONPath fields to lowercase
  • auto_repair.strip_markdown_fences: Remove code fence markers (default: true)

Artifact Saving

Enable comprehensive artifact saving with --save-io:

prompt-contracts run \
  --pd pd.json --es es.json --ep ep.json \
  --save-io artifacts/

Directory Structure:

artifacts/
  <target-id>/
    <fixture-id>/
      input_final.txt      # Final prompt with augmentations
      output_raw.txt       # Raw model response
      output_norm.txt      # Normalized output after auto-repair
      run.json             # Complete execution metadata

run.json Contents:

{
  "pcsl": "0.1.0",
  "target": "ollama:mistral",
  "params": { "temperature": 0 },
  "execution": {
    "mode": "assist",
    "effective_mode": "assist",
    "max_retries": 1
  },
  "latency_ms": 2314,
  "retries_used": 0,
  "status": "REPAIRED",
  "repaired_details": {
    "stripped_fences": true,
    "lowercased_fields": ["$.priority"]
  },
  "checks": [...],
  "prompt_hash": "a1b2c3...",
  "timestamp": "2025-10-07T12:34:56Z"
}

PCSL Specification

Conformance Levels

PCSL defines progressive conformance levels:

L1 - Structural Conformance

  • JSON validity validation
  • Required field presence checking
  • Token budget enforcement
  • Basic structural guarantees

L2 - Semantic Conformance

Includes L1 plus:

  • Enum value validation with JSONPath
  • Regex pattern assertions (presence/absence)
  • Advanced field-level checks
  • Semantic property validation

L3 - Differential Conformance

Includes L2 plus:

  • Multi-target execution and comparison
  • Pass-rate validation across models
  • Latency budget enforcement (p95)
  • Tolerance-based acceptance criteria

L4 - Security Conformance (Planned)

Includes L3 plus:

  • Jailbreak escape-rate metrics
  • PII leakage detection
  • Adversarial robustness testing
  • Security property validation

Built-in Checks

pc.check.json_valid

Validates response is parseable JSON.

Parameters: None

{ "type": "pc.check.json_valid" }

pc.check.json_required

Validates presence of required fields at root level.

Parameters:

  • fields (array): Required field names
{
  "type": "pc.check.json_required",
  "fields": ["category", "priority", "reason"]
}

pc.check.enum

Validates field value against allowed enumeration.

Parameters:

  • field (string): JSONPath to field
  • allowed (array): Allowed values
  • case_insensitive (boolean, optional): Case-insensitive comparison
{
  "type": "pc.check.enum",
  "field": "$.priority",
  "allowed": ["low", "medium", "high"],
  "case_insensitive": false
}

pc.check.regex_absent

Validates regex pattern is NOT present in response.

Parameters:

  • pattern (string): Regex pattern
{ "type": "pc.check.regex_absent", "pattern": "```" }

pc.check.token_budget

Validates response length stays within token budget.

Parameters:

  • max_out (integer): Maximum output tokens
{ "type": "pc.check.token_budget", "max_out": 200 }

Note: Current implementation approximates tokens by word count.

pc.check.latency_budget

Validates p95 latency across all fixtures.

Parameters:

  • p95_ms (integer): p95 latency threshold in milliseconds
{ "type": "pc.check.latency_budget", "p95_ms": 5000 }

Adapters

Provider Support Matrix

Provider Schema Enforcement Mode Support Status
OpenAI Full support via response_format All modes (observe, assist, enforce, auto) Production-ready
⚠️ Ollama Falls back to assist (no schema enforcement) observe, assist, auto Recommended for local models
⚠️ Others Use capability-based fallback adapters observe, assist, auto Extensible via custom adapters

Important: Assist fallback is the recommended mode for providers without schema enforcement (like Ollama and most local models). This is not an error — assist mode adds intelligent constraints to prompts and applies auto-repair, providing robust output validation without requiring native schema support.

OpenAI Adapter

Uses OpenAI SDK with full schema enforcement support.

Capabilities:

  • schema_guided_json: True (via response_format)
  • tool_calling: True
  • function_call_json: False

Features:

  • Structured output via response_format with JSON Schema
  • Enables enforce mode for guaranteed structure
  • Automatic fallback to assist when enforce unavailable
  • Parameter support: temperature, max_tokens

Configuration:

{
  "type": "openai",
  "model": "gpt-4o-mini",
  "params": {
    "temperature": 0,
    "max_tokens": 500
  }
}

Ollama Adapter

Supports local model execution via Ollama API.

Capabilities:

  • schema_guided_json: False
  • tool_calling: False
  • function_call_json: False

Features:

  • Local model execution (privacy-first, cost-effective)
  • HTTP API integration
  • Automatically uses assist mode with constraint augmentation
  • Auto-repair handles common issues (markdown fences, casing)
  • Parameter support: temperature

Configuration:

{
  "type": "ollama",
  "model": "mistral",
  "params": {
    "temperature": 0
  }
}

Note: Ollama works best with mode: assist or mode: auto in your EP. The framework will automatically add constraints to prompts and apply normalization to ensure reliable structured outputs.

Custom Adapters

Implement custom adapters by subclassing AbstractAdapter:

from promptcontracts.core.adapters import AbstractAdapter, Capability

class CustomAdapter(AbstractAdapter):
    def capabilities(self) -> Capability:
        return Capability(
            schema_guided_json=True,
            tool_calling=False,
            function_call_json=False
        )

    def generate(self, prompt: str, schema=None):
        # Implementation
        return response_text, latency_ms

Reporters

CLI Reporter

Rich-formatted terminal output with color coding and hierarchical structure.

Usage:

prompt-contracts run --report cli [--out output.txt]

Features:

  • Color-coded status indicators
  • Hierarchical fixture/check display
  • Repair detail tracking
  • Artifact path display
  • Summary statistics

JSON Reporter

Machine-readable JSON output for programmatic consumption.

Usage:

prompt-contracts run --report json [--out results.json]

Features:

  • Complete result serialization
  • Artifact path inclusion
  • Metadata enrichment
  • Timestamping

JUnit Reporter

JUnit XML format for CI/CD integration.

Usage:

prompt-contracts run --report junit [--out junit.xml]

Features:

  • Standard JUnit XML format
  • Test case per check
  • Failure detail capture
  • CI/CD pipeline integration

Architecture

Project Structure

src/promptcontracts/
  cli.py                    # CLI entry points
  core/
    loader.py               # Artefact loading and schema validation
    validator.py            # Check registry and execution
    runner.py               # Contract orchestration
    checks/                 # Built-in check implementations
      json_valid.py
      json_required.py
      enum_value.py
      regex_absent.py
      token_budget.py
      latency_budget.py
    adapters/               # LLM provider adapters
      base.py
      openai_adapter.py
      ollama_adapter.py
    reporters/              # Output formatters
      cli_reporter.py
      json_reporter.py
      junit_reporter.py
  spec/                     # PCSL specification
    pcsl-v0.1.md
    schema/
      pcsl-pd.schema.json
      pcsl-es.schema.json
      pcsl-ep.schema.json
examples/                   # Example contracts
tests/                      # Test suite

Dependencies

Core:

  • pyyaml: YAML parsing
  • jsonschema: Schema validation
  • jsonpath-ng: JSONPath evaluation
  • httpx: HTTP client for Ollama
  • numpy: Statistical calculations

Provider SDKs:

  • openai: OpenAI API integration

CLI:

  • rich: Terminal formatting

Testing

Run Test Suite

# All tests
pytest tests/ -v

# Specific test module
pytest tests/test_enforcement.py -v

# With coverage
pytest tests/ --cov=promptcontracts --cov-report=html

Test Categories

  • Loader Tests: Schema validation, file parsing
  • Check Tests: Individual check logic
  • Enforcement Tests: Normalization, schema derivation, retries
  • Integration Tests: End-to-end contract execution

Current Coverage

  • 17 tests passing
  • Core functionality: 100%
  • Enforcement features: 100%
  • Edge cases: Ongoing

Roadmap

Completed (v0.1)

  • PCSL specification v0.1 with JSON Schemas
  • Execution modes (observe, assist, enforce, auto)
  • Auto-repair and bounded retries
  • Schema-guided JSON (OpenAI structured outputs)
  • Artifact saving with full IO transparency
  • OpenAI and Ollama adapters
  • CLI, JSON, and JUnit reporters
  • Conformance levels L1-L3 (scaffold)

Planned (v0.2)

  • L3 Differential runner enhancements
    • Statistical significance testing
    • Drift detection algorithms
    • A/B testing support
  • HTML reporter with visualization
    • Trend charts
    • Diff views
    • Interactive filtering
  • Additional check types
    • JSON Schema field validation
    • Numeric range checks
    • Cross-field dependencies
    • String length validation

Planned (v0.3)

  • L4 Security conformance
    • Jailbreak escape-rate metrics
    • PII leakage detection
    • Prompt injection testing
    • Adversarial robustness
  • Additional adapters
    • Anthropic Claude
    • Google Gemini
    • Azure OpenAI
    • Hugging Face
  • Observability integration
    • OpenTelemetry export
    • Prometheus metrics
    • Grafana dashboards

Planned (Future)

  • Multi-modal support (images, audio)
  • GitHub Action and GitLab CI templates
  • VS Code extension
  • Pre-commit hooks
  • Fine-tuning contract integration
  • Production monitoring integration

Contributing

Spec Governance

The PCSL specification lives under src/promptcontracts/spec/. Changes to the specification follow an RFC process:

  1. Open a GitHub Issue describing the proposed change
  2. Label as spec-rfc
  3. Community discussion and feedback
  4. Approval by maintainers
  5. Implementation and documentation

Development Setup

# Clone repository
git clone https://github.com/promptcontracts/prompt-contracts.git
cd prompt-contracts

# Install development dependencies
pip install -r requirements.txt
pip install -e .

# Run tests
pytest tests/ -v

Contribution Guidelines

  • Follow existing code style and patterns
  • Add tests for new features
  • Update documentation
  • Ensure all tests pass
  • Write clear commit messages

Versioning

PCSL and prompt-contracts follow Semantic Versioning:

  • Patch (0.1.x): Bug fixes, clarifications
  • Minor (0.x.0): New features, backward-compatible additions
  • Major (x.0.0): Breaking changes to artefact structure or behavior

License

Code: MIT License Documentation: CC-BY 4.0

See LICENSE file for details.


Support

  • Documentation: See QUICKSTART.md for getting started guide
  • Best Practices: Read BEST_PRACTICES.md for production guidance
  • Troubleshooting: Check TROUBLESHOOTING.md for common issues and solutions
  • Specification: Read src/promptcontracts/spec/pcsl-v0.1.md for detailed spec
  • Examples: Explore examples/ for real-world use cases
  • Issues: Report bugs and request features via GitHub Issues
  • Discussions: Join community discussions on GitHub Discussions

Citation

If you use Prompt-Contracts in your research or production systems, please cite:

@software{promptcontracts2025,
  title = {Prompt-Contracts: Contract Testing for LLM Prompts},
  author = {Prompt-Contracts Contributors},
  year = {2025},
  url = {https://github.com/promptcontracts/prompt-contracts},
  version = {0.1.0}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prompt_contracts-0.3.2.tar.gz (101.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prompt_contracts-0.3.2-py3-none-any.whl (85.4 kB view details)

Uploaded Python 3

File details

Details for the file prompt_contracts-0.3.2.tar.gz.

File metadata

  • Download URL: prompt_contracts-0.3.2.tar.gz
  • Upload date:
  • Size: 101.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for prompt_contracts-0.3.2.tar.gz
Algorithm Hash digest
SHA256 4787ed528d705b2fe2088a52ef76133b6bc4a37c478f92647a7a802040083e7f
MD5 9c4f6ff7fcccfa0fbd87c94d39e48598
BLAKE2b-256 8ec94606884b6c7d3f9660b4bca5f48f08e1b3ddfef579a2a0a9051d4d9713d9

See more details on using hashes here.

Provenance

The following attestation bundles were made for prompt_contracts-0.3.2.tar.gz:

Publisher: publish-pypi.yml on philippmelikidis/prompt-contracts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file prompt_contracts-0.3.2-py3-none-any.whl.

File metadata

File hashes

Hashes for prompt_contracts-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 fd14b1ac79eec047299deac90d72f74596e00a4b9933de47db05fe042fce1fb1
MD5 db5dc3b52d4df42abd7c414cd5b4c990
BLAKE2b-256 e3b1df8f324124b9b823b4164a631c9c0eb3429179b26c79dbe24c4e003f866b

See more details on using hashes here.

Provenance

The following attestation bundles were made for prompt_contracts-0.3.2-py3-none-any.whl:

Publisher: publish-pypi.yml on philippmelikidis/prompt-contracts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page