Type-safe prompt management with automatic optimization for LLMs
Project description
FlowPrompt
Stop guessing which prompt works. Measure it.
30-Second Quickstart
Define prompts as Python classes. No API key needed to preview messages:
from flowprompt import Prompt
from pydantic import BaseModel
class ExtractUser(Prompt):
system = "Extract user info from text."
user = "Text: {text}"
class Output(BaseModel):
name: str
age: int
# Preview messages -- works without an API key
print(ExtractUser(text="John is 25").to_messages())
# [{'role': 'system', 'content': 'Extract user info from text.'},
# {'role': 'user', 'content': 'Text: John is 25'}]
# Run against any LLM
result = ExtractUser(text="John is 25").run(model="gpt-4o")
print(result.name) # "John"
print(result.age) # 25
Compare Prompts in 5 Lines
The killer feature: find which prompt actually works better, with statistical significance.
from flowprompt import Prompt, compare
class Concise(Prompt):
system = "Be concise."
user = "Summarize: {text}"
class Detailed(Prompt):
system = "Be thorough and detailed."
user = "Provide a comprehensive summary of: {text}"
result = compare(
{"concise": Concise, "detailed": Detailed},
inputs=[{"text": "Python is a programming language..."}, ...],
expected=["Python is a versatile language", ...],
model="gpt-4o-mini",
)
print(result)
# Comparison Results
# ========================================
# concise: 90% accuracy, 245ms avg, 50 runs << WINNER
# detailed: 72% accuracy, 410ms avg, 50 runs
#
# p=0.0231 (SIGNIFICANT)
# effect size: -20.00%
Test Prompts in CI
FlowPrompt includes a pytest plugin (auto-discovered, zero config):
# test_prompts.py
import pytest
@pytest.mark.prompt_test
def test_sentiment(fp_compare):
result = fp_compare(
{"v1": PromptV1, "v2": PromptV2},
inputs=[{"text": "I love this!"}],
expected=["positive"],
model="gpt-4o-mini",
)
result.assert_significant()
result.assert_winner("v1")
result.assert_no_errors()
pip install flowprompt-ai[pytest]
pytest --no-slow-prompts # skip expensive tests
Installation
pip install flowprompt-ai
Note: The package is installed as
flowprompt-aibut imported asflowprompt
Optional extras:
pip install flowprompt-ai[all] # Everything
pip install flowprompt-ai[pytest] # Pytest fixtures & markers
pip install flowprompt-ai[cli] # CLI tools
pip install flowprompt-ai[tracing] # OpenTelemetry support
pip install flowprompt-ai[multimodal] # Images, PDFs, audio, video
A/B Testing
FlowPrompt is the only Python LLM framework with built-in A/B testing.
Quick comparison with compare():
from flowprompt import compare
result = compare(
{"v1": PromptV1, "v2": PromptV2, "v3": PromptV3},
inputs=test_data,
model="gpt-4o-mini",
confidence_level=0.95,
)
if result.winner:
print(f"Winner: {result.winner} (p={result.statistical_result.p_value:.4f})")
Full experiment control when you need production traffic splitting, sticky user assignment, or multi-armed bandits:
from flowprompt.testing import create_simple_experiment
config, runner = create_simple_experiment(
name="prompt_comparison",
control_prompt=PromptV1,
treatment_prompts=[("v2", PromptV2)],
min_samples=100,
)
runner.start_experiment(config.id)
variant = runner.get_variant(config.id, user_id="user123")
result = runner.run_prompt(config.id, variant.name, input_data={"text": "..."})
summary = runner.get_summary(config.id)
if summary.winner:
print(f"Winner: {summary.winner.name}")
Six allocation strategies: Random, Round-Robin, Weighted, Epsilon-Greedy, UCB, Thompson Sampling.
Four statistical tests: Z-test, Chi-squared, Welch's t-test, Bayesian.
Structured Outputs
Define expected output as a Pydantic model. Parsing and validation are automatic.
from pydantic import BaseModel, Field
class Sentiment(Prompt):
system = "Analyze the sentiment of the given text."
user = "Text: {text}"
class Output(BaseModel):
sentiment: str = Field(description="positive, negative, or neutral")
confidence: float = Field(ge=0.0, le=1.0)
result = Sentiment(text="I love this!").run(model="gpt-4o")
print(result.sentiment) # "positive"
print(result.confidence) # 0.95
Models that support native JSON schema get guaranteed valid output. Others fall back to JSON mode with schema hints.
Multi-Provider Support
Switch between 100+ providers with a single parameter.
result = prompt.run(model="gpt-4o") # OpenAI
result = prompt.run(model="anthropic/claude-3-5-sonnet-20241022") # Anthropic
result = prompt.run(model="gemini/gemini-2.0-flash-exp") # Google
result = prompt.run(model="ollama/llama3") # Local
More Features
| Feature | Example |
|---|---|
| Caching | configure_cache(enabled=True, default_ttl=3600) -- cut costs 50-90% |
| Optimization | DSPy-style auto-improvement with flowprompt.optimize |
| Streaming | for chunk in prompt.stream(model="gpt-4o"): ... |
| Observability | get_tracer().get_summary() -- costs, tokens, latency |
| YAML prompts | load_prompt("prompts/my_prompt.yaml") |
| Multimodal | Images, PDFs, audio via flowprompt.multimodal |
| CLI | flowprompt optimize prompt.py examples.json |
Comparison
| Feature | FlowPrompt | LangChain | Instructor | DSPy |
|---|---|---|---|---|
| A/B testing | Built-in | No | No | No |
| Structured outputs | Yes | Partial | Best-in-class | Yes |
| Auto-optimization | Yes | No | No | Best-in-class |
| Multi-provider | Yes | Yes | Yes | Partial |
| Caching | Yes | Yes | Yes | Yes |
| Cost tracking | Yes | Partial | No | No |
| Streaming | Yes | Yes | Yes | Yes |
| Import time | <100ms | ~2s | <100ms | ~6s |
Documentation
- Quick Start Guide -- Get started in 5 minutes
- A/B Testing Guide -- Run experiments
- API Reference -- Complete API documentation
- Optimization Guide -- Improve prompts automatically
- Examples -- Runnable example scripts
Contributing
Contributions are welcome! See CONTRIBUTING.md for guidelines.
git clone https://github.com/yotambraun/flowprompt.git
cd flowprompt
uv venv && uv sync --all-extras
uv run pytest
License
MIT License -- see LICENSE for details.
Made with care by Yotam Braun
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flowprompt_ai-0.3.0.tar.gz.
File metadata
- Download URL: flowprompt_ai-0.3.0.tar.gz
- Upload date:
- Size: 141.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f3948ddd3426e8f1f1bf7745822c37ecd3add18218c7f9966b9b6a0c6b5320fe
|
|
| MD5 |
9b46c8e3c47b1a71137ebb911ffb39c7
|
|
| BLAKE2b-256 |
df69b8dc0724d44cf0acf93053d6630fb93e93c7443c75ec72ff9afa02df1408
|
File details
Details for the file flowprompt_ai-0.3.0-py3-none-any.whl.
File metadata
- Download URL: flowprompt_ai-0.3.0-py3-none-any.whl
- Upload date:
- Size: 82.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6d516eb21499ced64b320d217bce0f71e6ed19bb660255da1553667a5a56e2b
|
|
| MD5 |
e93786b481cd002da8ddf9fc1d42fa34
|
|
| BLAKE2b-256 |
17a76f2363c02fa910a3d9f5414a87c8cd415be32bd2d84a8c08bdd840c33c68
|