Skip to main content

CI/CD for AI Prompts - Test, lint, version, and deploy prompts with confidence

Project description

PromptOps

CI/CD for AI prompts.

Python 3.9+ License: MIT PyPI version CI

PromptOps helps teams version, test, lint, cache, and safely deploy AI prompts — just like modern DevOps, but for LLM behavior.

If prompts can break production, they deserve:

  • Tests — Rule-based and semantic assertions
  • Linting — Best practices and security checks
  • Caching — Fast, cost-effective responses
  • Approvals — Gated deployments with audit trails
  • Rollbacks — Automatic recovery with circuit breakers
  • Cost & Safety Controls — Budget limits and content scanning
  • Beautiful CLI — Rich terminal output for developer joy

📑 Table of Contents


✨ Features

🎨 Beautiful CLI with Rich Output

Colorful, informative terminal output powered by Rich library:

  • Progress bars and spinners
  • Syntax-highlighted YAML
  • Beautiful tables and panels
  • Interactive project trees

📁 Project Scaffolding

Initialize new projects with best-practice structure:

promptops init my-ai-app

🔍 Prompt Linting

11+ built-in rules checking for:

  • Missing templates or tests
  • Security vulnerabilities
  • Cost optimization opportunities
  • Best practice violations

⚡ Response Caching

Three caching backends for faster, cheaper operations:

  • Memory Cache — Fast in-process caching
  • File Cache — Persistent file-based storage
  • SQLite Cache — Production-ready with TTL support

🧪 Prompt Testing

  • Rule-based assertions — Word count, JSON validation, regex
  • Semantic tests — LLM-as-Judge evaluation
  • CI-friendly output — GitHub Actions integration

🔒 Safety Scanning

  • PII detection (SSN, credit cards, emails)
  • Prompt injection detection
  • Sensitive keyword filtering
  • Risk scoring

✅ Approval Gates

  • Workflow management with audit trail
  • Environment enforcement
  • Gated production deployments

⏪ Rollback Engine

  • Circuit breaker pattern
  • Automatic failure recovery
  • Health monitoring

💰 Budget Management

  • Per-model cost tracking
  • Budget periods and alerts
  • Usage analytics

📦 Installation

pip install promptops

Install with all extras:

pip install "promptops[all]"

Install for development:

git clone https://github.com/promptops/promptops.git
cd promptops
pip install -e ".[dev]"

Requirements

  • Python 3.9+
  • OpenAI API key (for LLM features)
export OPENAI_API_KEY=your_key_here

🚀 Quick Start

1. Initialize a project

promptops init my-ai-app
cd my-ai-app

This creates:

my-ai-app/
├── prompts/
│   └── example/
│       └── v1.yaml
├── promptops.yaml
├── .gitignore
└── README.md

2. Create a prompt

promptops create email_summary v1

Edit prompts/email_summary/v1.yaml:

template: |
  Summarize the following email politely and concisely:
  
  {email}

approved: false
provider: openai

tests:
  - name: polite_summary
    input:
      email: "This is a long email about a delayed shipment..."
    assert:
      max_words: 60
      min_words: 10
      must_exclude: ["hate", "stupid"]

3. Lint your prompts

promptops lint --all

4. Run tests

promptops test email_summary v1

5. Run the prompt

promptops run email_summary v1

📖 CLI Reference

Global Options

promptops --help
promptops --version
promptops --env prod     # Set environment
promptops --verbose      # Enable debug output

Commands

Command Description
init <name> Create a new PromptOps project
create <name> <version> Create a new prompt file
run <name> <version> Execute a prompt
test <name> <version> Run prompt tests
lint [--all] Lint prompts for issues
list List all available prompts
show <name> <version> Show prompt details
check-safety [--all] Run safety scans
cache --stats Show cache statistics
cache-clear Clear response cache
approve <name> <version> Approve a prompt
rollback <name> <version> Rollback to previous version

📁 Project Initialization

Basic Setup

promptops init my-project

With Options

# Use full template with more examples
promptops init my-project --template full

# Skip GitHub Actions setup
promptops init my-project --no-github-actions

# Specify provider
promptops init my-project --provider anthropic

# Dry run - see what would be created
promptops init my-project --dry-run

Templates

Template Contents
minimal Just config and one prompt
basic Config, examples, and tests
full Everything including CI/CD

🔍 Prompt Linting

Lint All Prompts

promptops lint --all

Lint Single Prompt

promptops lint email_summary v1

Filter by Severity

promptops lint --all --severity warning  # Only warnings and errors
promptops lint --all --severity error    # Only errors

Output Formats

promptops lint --all --format text     # Human-readable (default)
promptops lint --all --format json     # JSON output
promptops lint --all --format github   # GitHub Actions annotations

Built-in Rules

Rule Severity Description
template-required ERROR Template must be defined
tests-required WARNING Tests should be defined
security-patterns ERROR No hardcoded secrets
prompt-length WARNING Reasonable token count
cache-config INFO Caching recommended
provider-valid ERROR Valid provider specified
jinja-syntax ERROR Valid Jinja2 syntax
variable-naming WARNING Consistent variable names
test-coverage WARNING Test all assertions
model-specified INFO Explicit model version
metadata-complete INFO Description and tags

⚡ Response Caching

Enable Caching

Caching is enabled by default. Disable for a single run:

promptops run email_summary v1 --no-cache

Cache Configuration

In promptops.yaml:

cache:
  backend: sqlite      # memory, file, or sqlite
  ttl: 3600           # Time-to-live in seconds
  max_size: 1000      # Maximum entries
  path: .promptops/cache  # Cache directory

Cache Management

# View cache statistics
promptops cache --stats

# Clear all cached responses
promptops cache-clear

Python API

from promptops.cache import get_cache, configure_cache, cache_prompt

# Configure cache
configure_cache(
    backend="sqlite",
    ttl=3600,
    max_size=1000
)

# Use decorator
@cache_prompt(ttl=1800)
def get_summary(text: str) -> str:
    return prompt.run({"text": text})

# Manual cache access
cache = get_cache()
cache.set("key", "value", ttl=3600)
value = cache.get("key")

Cache Backends

Backend Use Case Persistence
memory Development, testing No
file Single-machine production Yes
sqlite Production, shared access Yes

🧪 Testing

Run All Tests

promptops test --all

Test Single Prompt

promptops test email_summary v1

Test Assertions

tests:
  - name: basic_test
    input:
      email: "Test email content..."
    assert:
      # Length assertions
      max_words: 100
      min_words: 10
      max_chars: 500
      
      # Content assertions
      must_include: ["summary", "regards"]
      must_exclude: ["error", "fail"]
      matches_pattern: "^Dear.*"
      
      # Format assertions
      is_json: true
      
      # Semantic assertions (LLM-based)
      semantic:
        - is_polite
        - summary_present
        - professional_tone

Semantic Testing

Use LLM-as-Judge for meaning-based evaluation:

tests:
  - name: semantic_test
    input:
      text: "Angry customer complaint..."
    assert:
      semantic:
        - response_is_empathetic
        - offers_solution
        - maintains_brand_voice

🔒 Safety Scanning

Scan All Prompts

promptops check-safety --all

Strict Mode

promptops check-safety --all --strict

Detection Capabilities

  • PII Detection: SSN, credit cards, emails, phone numbers
  • Injection Detection: Jailbreak attempts, system overrides
  • Sensitive Keywords: Customizable patterns

Configuration

policies:
  safety:
    block_pii: true
    strict_mode: true
    custom_patterns:
      - "CONFIDENTIAL"
      - "password.*="

✅ Approval Workflow

Request Approval

promptops request-approval email_summary v1 --user alice

Approve Prompt

promptops approve email_summary v1 --approver bob --reason "Reviewed and tested"

Check Status

promptops approval-status email_summary v1

Python API

from promptops import ApprovalManager

manager = ApprovalManager()
manager.request_approval("email_summary:v1", "alice")
manager.approve("email_summary:v1", "bob", reason="LGTM")
status = manager.status("email_summary:v1")

💰 Cost Management

Allocate Budget

promptops allocate-budget email_summary v1 --amount 10.00

Configuration

policies:
  cost:
    max_daily_spend: 100.0
    alerts:
      - threshold: 0.5
        action: warn
      - threshold: 0.9
        action: alert

Python API

from promptops.cost import BudgetPool

pool = BudgetPool()
pool.allocate("email_summary:v1", 10.0)
pool.consume("email_summary:v1", 0.05)
balance = pool.balance("email_summary:v1")

⏪ Rollback Engine

Manual Rollback

promptops rollback email_summary v1

Circuit Breaker

Automatic rollback after failures:

policies:
  rollback:
    circuit_breaker:
      failure_threshold: 5
      recovery_timeout: 60

Python API

from promptops.rollback import RollbackEngine

engine = RollbackEngine()
engine.record_failure("email_summary:v1", Exception("API error"))

if engine.should_circuit_break("email_summary:v1"):
    engine.rollback("email_summary:v1")

🔄 GitHub Actions

PromptOps includes a ready-to-use GitHub Actions workflow.

Setup

When initializing a project:

promptops init my-project  # Includes .github/workflows/promptops.yml

Or copy the workflow manually:

cp .github/workflows/promptops.yml your-repo/.github/workflows/

Workflow Features

  • Lint on Push: Validate prompts on every push
  • Safety Scan: Automatic security checks
  • Test Suite: Run all prompt tests
  • Approval Gates: Enforce approvals for production
  • Deployment: Automated production deployment

Required Secrets

Add these to your repository secrets:

Secret Description
OPENAI_API_KEY OpenAI API key for tests
ANTHROPIC_API_KEY (Optional) Anthropic key
DEPLOY_TOKEN Deployment credentials

Workflow Jobs

jobs:
  lint:      # 🔍 Lint all prompts
  safety:    # 🔒 Security scan
  test:      # 🧪 Run tests
  approval:  # ✅ Check approvals
  deploy:    # 🚀 Deploy to production
  rollback:  # ⏪ Manual rollback trigger

🐍 Python API

Basic Usage

from promptops import Prompt

# Load and run a prompt
prompt = Prompt.load("email_summary", "v1")
result = prompt.run({"email": "..."})

With Caching

from promptops import Prompt
from promptops.cache import cache_prompt

@cache_prompt(ttl=3600)
def summarize(email: str) -> str:
    prompt = Prompt.load("email_summary", "v1")
    return prompt.run({"email": email})

Run Tests

from promptops import Prompt
from promptops.testing import run_tests

prompt = Prompt.load("email_summary", "v1")
report = run_tests(prompt, prompt.provider, prompt.config["tests"])

if not report.passed:
    for failure in report.failures:
        print(f"Failed: {failure.name} - {failure.reason}")

Lint Prompts

from promptops.lint import lint_prompt, lint_all_prompts

# Single prompt
result = lint_prompt("email_summary", "v1")
print(f"Passed: {result.passed}")
for issue in result.issues:
    print(f"  {issue.severity}: {issue.message}")

# All prompts
report = lint_all_prompts("prompts/")
print(report.summary())

Custom Lint Rules

from promptops.lint import LintRule, LintIssue, LintSeverity

class CustomRule(LintRule):
    id = "custom-rule"
    name = "Custom Rule"
    description = "Check for custom requirements"
    severity = LintSeverity.WARNING
    
    def check(self, config: dict, file_path: str) -> list[LintIssue]:
        issues = []
        if "custom_field" not in config:
            issues.append(self.create_issue(
                message="Missing custom_field",
                line=1
            ))
        return issues

⚙️ Configuration

Prompt YAML Schema

# Required
template: |
  Your prompt with {variables}

# Optional
approved: false           # Approval status
provider: openai          # Provider name
model: gpt-4             # Specific model
description: "..."        # Human description
tags: [summarization]     # Categorization

# Caching
cache:
  enabled: true
  ttl: 3600

# Tests
tests:
  - name: test_name
    input:
      variable: "value"
    assert:
      max_words: 100
      must_include: ["word"]
      semantic:
        - is_coherent

Global Config (promptops.yaml)

# Default provider
provider: openai

# Environment settings
environments:
  dev:
    require_approval: false
    strict_safety: false
  staging:
    require_approval: false
    strict_safety: true
  prod:
    require_approval: true
    strict_safety: true

# Caching
cache:
  backend: sqlite
  ttl: 3600
  max_size: 1000

# Policies
policies:
  safety:
    block_pii: true
    strict_mode: true
  cost:
    max_daily_spend: 100.0
  rollback:
    failure_threshold: 5

📁 Project Structure

promptops/
├── __init__.py              # Package exports
├── prompt.py                # Core Prompt class
├── loader.py                # YAML/remote loading
├── renderer.py              # Template rendering
├── guard.py                 # Safety guard
├── approval.py              # Approval workflow
├── policies.py              # Global policies
├── env.py                   # Environment detection
├── diff.py                  # Prompt diffing
├── exceptions.py            # Exception hierarchy
├── utils.py                 # Utility functions
├── promptops.yaml           # Default policies
├── pyproject.toml           # Package config
├── cli/
│   ├── __init__.py
│   ├── main.py              # CLI commands
│   └── console.py           # Rich output helpers
├── cache/
│   ├── __init__.py
│   └── manager.py           # Cache backends
├── lint/
│   ├── __init__.py
│   ├── rules.py             # Lint rules
│   └── linter.py            # Linter engine
├── scaffold/
│   ├── __init__.py
│   └── generator.py         # Project scaffolding
├── cost/
│   ├── __init__.py
│   └── budget.py            # Budget management
├── providers/
│   ├── __init__.py
│   └── openai_provider.py   # OpenAI integration
├── rollback/
│   ├── __init__.py
│   ├── engine.py            # Rollback logic
│   └── store.py             # Failure tracking
├── safety/
│   ├── __init__.py
│   └── scanner.py           # Safety scanning
└── testing/
    ├── __init__.py
    ├── assertions.py        # Rule assertions
    ├── llm_judge.py         # Semantic tests
    ├── results.py           # Test results
    └── runner.py            # Test runner

🗺️ Roadmap

Completed ✅

  • Prompt versioning and loading
  • Rule-based testing
  • Semantic testing (LLM-as-Judge)
  • Safety scanning
  • Approval workflow
  • Rollback engine
  • Budget management
  • Rich CLI output
  • Project scaffolding (promptops init)
  • Prompt linting (11+ rules)
  • Response caching (3 backends)
  • GitHub Actions workflow

Coming Soon 🔜

  • VS Code extension
  • Web dashboard
  • Prompt playground
  • A/B testing framework
  • Multi-provider support (Anthropic, Cohere)
  • Prompt embeddings and search
  • Team collaboration features

🤝 Contributing

Contributions are welcome! Please read our Contributing Guide for details.

# Setup development environment
git clone https://github.com/promptops/promptops.git
cd promptops
pip install -e ".[dev]"

# Run tests
pytest

# Run linting
ruff check promptops
mypy promptops

📄 License

MIT License - see LICENSE for details.


💬 Support


Made with ❤️ for the AI engineering community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prompt_cicd-0.2.2.tar.gz (109.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prompt_cicd-0.2.2-py3-none-any.whl (99.5 kB view details)

Uploaded Python 3

File details

Details for the file prompt_cicd-0.2.2.tar.gz.

File metadata

  • Download URL: prompt_cicd-0.2.2.tar.gz
  • Upload date:
  • Size: 109.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for prompt_cicd-0.2.2.tar.gz
Algorithm Hash digest
SHA256 db8e2f71c1520b1a74ee78060206515cc534e05e280811f4b234ae99099635d8
MD5 8c27aa4446d9d5512cc74649536efab1
BLAKE2b-256 e669226b07d8046fa29df87e17b7508a08e6130d89a7cd7ba3e137af2c890f49

See more details on using hashes here.

File details

Details for the file prompt_cicd-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: prompt_cicd-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 99.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for prompt_cicd-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0979faeefacca5e66b92cc1651a32842f028849ccb76aa0eb87a80c9602d6211
MD5 9274334acb508daaf02095dc91e65e1e
BLAKE2b-256 c667a27ea02e2d471a66b99c9c0638b0d9256510cc71ddb6095c730078b5f12d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page