Skip to main content

Type-safe prompt management with automatic optimization for LLMs

Project description

FlowPrompt

Stop guessing which prompt works. Measure it.

The only LLM framework with built-in A/B testing for prompts.

PyPI Downloads Downloads/Month Python License Tests codecov


Why FlowPrompt?

Every LLM framework gives you structured outputs. Only FlowPrompt tells you which prompt actually works better.

  • A/B Testing - Statistical significance testing for prompt variants
  • Type safety - Define prompts as Python classes with full IDE support
  • Structured outputs - Automatic validation with Pydantic models
  • Multi-provider - OpenAI, Anthropic, Google, or local models via LiteLLM
  • Production-ready - Caching, tracing, cost tracking built-in
from flowprompt import Prompt
from pydantic import BaseModel

class ExtractUser(Prompt):
    system: str = "Extract user info from text."
    user: str = "Text: {text}"

    class Output(BaseModel):
        name: str
        age: int

result = ExtractUser(text="John is 25").run(model="gpt-4o")
print(result.name)  # "John"
print(result.age)   # 25

Installation

pip install flowprompt-ai

Note: The package is installed as flowprompt-ai but imported as flowprompt

Optional extras:

pip install flowprompt-ai[all]        # Everything
pip install flowprompt-ai[cli]        # CLI tools
pip install flowprompt-ai[tracing]    # OpenTelemetry support
pip install flowprompt-ai[multimodal] # Images, PDFs, audio, video

Features at a Glance

Feature What it does
A/B Testing Statistical significance testing for prompts
Structured Outputs Type-safe responses with Pydantic validation
Multi-Provider OpenAI, Anthropic, Google, Ollama via LiteLLM
Optimization DSPy-style automatic prompt improvement
Caching Reduce costs 50-90% with built-in caching
Observability Track costs, tokens, and latency
Streaming Real-time responses with stream() and astream()
Multimodal Images, documents, audio, and video
YAML Prompts Store prompts in version-controlled files

Structured Outputs

Define your expected output as a Pydantic model. FlowPrompt handles parsing and validation automatically.

from pydantic import BaseModel, Field

class SentimentAnalysis(Prompt):
    system: str = "Analyze the sentiment of the given text."
    user: str = "Text: {text}"

    class Output(BaseModel):
        sentiment: str = Field(description="positive, negative, or neutral")
        confidence: float = Field(ge=0.0, le=1.0)
        keywords: list[str]

result = SentimentAnalysis(text="I love this product!").run(model="gpt-4o")
print(result.sentiment)   # "positive"
print(result.confidence)  # 0.95
print(result.keywords)    # ["love", "product"]

Multi-Provider Support

Switch between providers with a single parameter. No code changes needed.

# OpenAI
result = prompt.run(model="gpt-4o")

# Anthropic Claude
result = prompt.run(model="anthropic/claude-3-5-sonnet-20241022")

# Google Gemini
result = prompt.run(model="gemini/gemini-2.0-flash-exp")

# Local models via Ollama
result = prompt.run(model="ollama/llama3")

Streaming

Get real-time responses for better user experience.

# Synchronous
for chunk in prompt.stream(model="gpt-4o"):
    print(chunk.delta, end="", flush=True)

# Asynchronous
async for chunk in prompt.astream(model="gpt-4o"):
    print(chunk.delta, end="", flush=True)

Caching

Reduce API costs by caching identical requests.

from flowprompt import configure_cache, get_cache

# Enable caching with 1-hour TTL
configure_cache(enabled=True, default_ttl=3600)

# First call hits the API
result1 = MyPrompt(text="hello").run(model="gpt-4o")

# Second identical call uses cache (instant, free)
result2 = MyPrompt(text="hello").run(model="gpt-4o")

# Check performance
print(get_cache().stats)
# {'hits': 1, 'misses': 1, 'hit_rate': 0.5}

Observability

Track costs, tokens, and latency with OpenTelemetry integration.

from flowprompt import get_tracer

result = MyPrompt(text="hello").run(model="gpt-4o")

summary = get_tracer().get_summary()
print(f"Cost: ${summary['total_cost_usd']:.4f}")
print(f"Tokens: {summary['total_tokens']}")
print(f"Latency: {summary['avg_latency_ms']:.0f}ms")

Automatic Optimization

Improve prompts automatically using training data (inspired by DSPy).

from flowprompt.optimize import optimize, ExampleDataset, Example, ExactMatch

# Create training examples
dataset = ExampleDataset([
    Example(input={"text": "John is 25"}, output={"name": "John", "age": 25}),
    Example(input={"text": "Alice is 30"}, output={"name": "Alice", "age": 30}),
])

# Optimize with few-shot examples
result = optimize(
    ExtractUser,
    dataset=dataset,
    metric=ExactMatch(),
    strategy="fewshot",  # or "instruction", "optuna", "bootstrap"
)

print(f"Improved by: {result.best_score:.0%}")
OptimizedPrompt = result.best_prompt_class

A/B Testing

Run controlled experiments to compare prompt variants with statistical significance.

from flowprompt.testing import create_simple_experiment

# Setup experiment
config, runner = create_simple_experiment(
    name="prompt_comparison",
    control_prompt=PromptV1,
    treatment_prompts=[("v2", PromptV2)],
    min_samples=100,
)

runner.start_experiment(config.id)

# Get variant for a user (sticky assignment)
variant = runner.get_variant(config.id, user_id="user123")
result = runner.run_prompt(config.id, variant.name, input_data={"text": "..."})

# Check results
summary = runner.get_summary(config.id)
if summary.winner:
    print(f"Winner: {summary.winner.name}")
    print(f"Effect: {summary.statistical_result.effect_size:+.1%}")

Multimodal Support

Work with images, documents, audio, and video.

from flowprompt.multimodal import VisionPrompt, DocumentPrompt

# Analyze images
class ImageAnalyzer(VisionPrompt):
    system: str = "Describe what you see in the image."
    user: str = "What's in this image?"

result = ImageAnalyzer().with_image("photo.jpg").run(model="gpt-4o")

# Summarize documents
class DocSummarizer(DocumentPrompt):
    system: str = "Summarize documents concisely."
    user: str = "Summarize the key points."

result = DocSummarizer().with_document("report.pdf").run(model="gpt-4o")

YAML Prompts

Store prompts in version-controlled files for team collaboration.

# prompts/extract_user.yaml
name: ExtractUser
version: "1.0.0"
system: You are a precise data extractor.
user: "Extract from: {{ text }}"
output_schema:
  type: object
  properties:
    name: { type: string }
    age: { type: integer }
  required: [name, age]
from flowprompt import load_prompt, load_prompts

# Load single prompt
ExtractUser = load_prompt("prompts/extract_user.yaml")

# Load all prompts from directory
prompts = load_prompts("prompts/")

CLI

Optimize prompts from the command line:

# Optimize a prompt with training examples
flowprompt optimize my_prompt.py examples.json --strategy fewshot

# Output:
# Loading prompt from my_prompt.py...
#   Found: ExtractUser
# Loading examples from examples.json...
#   Loaded 10 examples
# Evaluating baseline...
#   Baseline accuracy: 65.0%
# Optimizing with strategy='fewshot'...
# --------------------------------------------------
# OPTIMIZATION COMPLETE
# --------------------------------------------------
#   Before: 65.0% accuracy
#   After:  89.0% accuracy
#   Change: +24.0%

Other commands:

flowprompt init my-project       # Initialize new project
flowprompt run prompt.yaml       # Run a prompt
flowprompt test                  # Validate prompts
flowprompt stats                 # View usage statistics

Comparison

Feature FlowPrompt LangChain Instructor DSPy
A/B Testing Yes No No No
Type-safe prompts Yes No Yes No
Structured outputs Yes Partial Yes No
Auto-optimization Yes No No Yes
Multi-provider Yes Yes Yes Partial
Caching Yes Partial No No
Cost tracking Yes Partial No No
Streaming Yes Yes No No
YAML prompts Yes No No No
Import time <100ms ~2s <100ms ~6s

Documentation


Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

git clone https://github.com/yotambraun/flowprompt.git
cd flowprompt
uv venv && uv sync --all-extras
uv run pytest

License

MIT License - see LICENSE for details.


Made with care by Yotam Braun

GitHub | PyPI | Issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flowprompt_ai-0.2.1.tar.gz (120.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flowprompt_ai-0.2.1-py3-none-any.whl (72.7 kB view details)

Uploaded Python 3

File details

Details for the file flowprompt_ai-0.2.1.tar.gz.

File metadata

  • Download URL: flowprompt_ai-0.2.1.tar.gz
  • Upload date:
  • Size: 120.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for flowprompt_ai-0.2.1.tar.gz
Algorithm Hash digest
SHA256 412ce1f76c7342a8f86caeb033d19d291056aece771e84808ed6f8545ba224ba
MD5 64fabd856e52980e2b41d936de06a9f9
BLAKE2b-256 6da74a395940c1d84a5ca0243cb78d4a24571c71d8e7bb745b5f0192aa573d4c

See more details on using hashes here.

File details

Details for the file flowprompt_ai-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: flowprompt_ai-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 72.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for flowprompt_ai-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c6c501748c0b258c9bdb4b06848634e0b2f1d9e290f22ff4e811f337f26918ad
MD5 5489ffbcd7677a09801436af8eea18a5
BLAKE2b-256 13a8206bf88f17b624b8c4ff729689fdb7cc902bf94f64406818e548a78bbc8a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page