Skip to main content

Type-safe prompt management with automatic optimization for LLMs

Project description

FlowPrompt

Stop guessing which prompt works. Measure it.

PyPI Downloads Downloads/Month Python License Tests codecov


30-Second Quickstart

Define prompts as Python classes. No API key needed to preview messages:

from flowprompt import Prompt
from pydantic import BaseModel

class ExtractUser(Prompt):
    system = "Extract user info from text."
    user = "Text: {text}"

    class Output(BaseModel):
        name: str
        age: int

# Preview messages -- works without an API key
print(ExtractUser(text="John is 25").to_messages())
# [{'role': 'system', 'content': 'Extract user info from text.'},
#  {'role': 'user', 'content': 'Text: John is 25'}]

# Run against any LLM
result = ExtractUser(text="John is 25").run(model="gpt-4o")
print(result.name)  # "John"
print(result.age)   # 25

Compare Prompts in 5 Lines

The killer feature: find which prompt actually works better, with statistical significance.

from flowprompt import Prompt, compare

class Concise(Prompt):
    system = "Be concise."
    user = "Summarize: {text}"

class Detailed(Prompt):
    system = "Be thorough and detailed."
    user = "Provide a comprehensive summary of: {text}"

result = compare(
    {"concise": Concise, "detailed": Detailed},
    inputs=[{"text": "Python is a programming language..."}, ...],
    expected=["Python is a versatile language", ...],
    model="gpt-4o-mini",
)
print(result)
# Comparison Results
# ========================================
#   concise: 90% accuracy, 245ms avg, 50 runs << WINNER
#   detailed: 72% accuracy, 410ms avg, 50 runs
#
#   p=0.0231 (SIGNIFICANT)
#   effect size: -20.00%

Test Prompts in CI

FlowPrompt includes a pytest plugin (auto-discovered, zero config):

# test_prompts.py
import pytest

@pytest.mark.prompt_test
def test_sentiment(fp_compare):
    result = fp_compare(
        {"v1": PromptV1, "v2": PromptV2},
        inputs=[{"text": "I love this!"}],
        expected=["positive"],
        model="gpt-4o-mini",
    )
    result.assert_significant()
    result.assert_winner("v1")
    result.assert_no_errors()
pip install flowprompt-ai[pytest]
pytest --no-slow-prompts  # skip expensive tests

Installation

pip install flowprompt-ai

Note: The package is installed as flowprompt-ai but imported as flowprompt

Optional extras:

pip install flowprompt-ai[all]        # Everything
pip install flowprompt-ai[pytest]     # Pytest fixtures & markers
pip install flowprompt-ai[cli]        # CLI tools
pip install flowprompt-ai[tracing]    # OpenTelemetry support
pip install flowprompt-ai[multimodal] # Images, PDFs, audio, video

A/B Testing

FlowPrompt is the only Python LLM framework with built-in A/B testing.

Quick comparison with compare():

from flowprompt import compare

result = compare(
    {"v1": PromptV1, "v2": PromptV2, "v3": PromptV3},
    inputs=test_data,
    model="gpt-4o-mini",
    confidence_level=0.95,
)

if result.winner:
    print(f"Winner: {result.winner} (p={result.statistical_result.p_value:.4f})")

Full experiment control when you need production traffic splitting, sticky user assignment, or multi-armed bandits:

from flowprompt.testing import create_simple_experiment

config, runner = create_simple_experiment(
    name="prompt_comparison",
    control_prompt=PromptV1,
    treatment_prompts=[("v2", PromptV2)],
    min_samples=100,
)

runner.start_experiment(config.id)
variant = runner.get_variant(config.id, user_id="user123")
result = runner.run_prompt(config.id, variant.name, input_data={"text": "..."})

summary = runner.get_summary(config.id)
if summary.winner:
    print(f"Winner: {summary.winner.name}")

Six allocation strategies: Random, Round-Robin, Weighted, Epsilon-Greedy, UCB, Thompson Sampling.

Four statistical tests: Z-test, Chi-squared, Welch's t-test, Bayesian.


Structured Outputs

Define expected output as a Pydantic model. Parsing and validation are automatic.

from pydantic import BaseModel, Field

class Sentiment(Prompt):
    system = "Analyze the sentiment of the given text."
    user = "Text: {text}"

    class Output(BaseModel):
        sentiment: str = Field(description="positive, negative, or neutral")
        confidence: float = Field(ge=0.0, le=1.0)

result = Sentiment(text="I love this!").run(model="gpt-4o")
print(result.sentiment)   # "positive"
print(result.confidence)  # 0.95

Models that support native JSON schema get guaranteed valid output. Others fall back to JSON mode with schema hints.


Multi-Provider Support

Switch between 100+ providers with a single parameter.

result = prompt.run(model="gpt-4o")                              # OpenAI
result = prompt.run(model="anthropic/claude-3-5-sonnet-20241022") # Anthropic
result = prompt.run(model="gemini/gemini-2.0-flash-exp")          # Google
result = prompt.run(model="ollama/llama3")                        # Local

More Features

Feature Example
Caching configure_cache(enabled=True, default_ttl=3600) -- cut costs 50-90%
Optimization DSPy-style auto-improvement with flowprompt.optimize
Streaming for chunk in prompt.stream(model="gpt-4o"): ...
Observability get_tracer().get_summary() -- costs, tokens, latency
YAML prompts load_prompt("prompts/my_prompt.yaml")
Multimodal Images, PDFs, audio via flowprompt.multimodal
CLI flowprompt optimize prompt.py examples.json

Comparison

Feature FlowPrompt LangChain Instructor DSPy
A/B testing Built-in No No No
Structured outputs Yes Partial Best-in-class Yes
Auto-optimization Yes No No Best-in-class
Multi-provider Yes Yes Yes Partial
Caching Yes Yes Yes Yes
Cost tracking Yes Partial No No
Streaming Yes Yes Yes Yes
Import time <100ms ~2s <100ms ~6s

Documentation


Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

git clone https://github.com/yotambraun/flowprompt.git
cd flowprompt
uv venv && uv sync --all-extras
uv run pytest

License

MIT License -- see LICENSE for details.


Made with care by Yotam Braun

GitHub | PyPI | Issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flowprompt_ai-0.3.0.tar.gz (141.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flowprompt_ai-0.3.0-py3-none-any.whl (82.4 kB view details)

Uploaded Python 3

File details

Details for the file flowprompt_ai-0.3.0.tar.gz.

File metadata

  • Download URL: flowprompt_ai-0.3.0.tar.gz
  • Upload date:
  • Size: 141.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for flowprompt_ai-0.3.0.tar.gz
Algorithm Hash digest
SHA256 f3948ddd3426e8f1f1bf7745822c37ecd3add18218c7f9966b9b6a0c6b5320fe
MD5 9b46c8e3c47b1a71137ebb911ffb39c7
BLAKE2b-256 df69b8dc0724d44cf0acf93053d6630fb93e93c7443c75ec72ff9afa02df1408

See more details on using hashes here.

File details

Details for the file flowprompt_ai-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: flowprompt_ai-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 82.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for flowprompt_ai-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b6d516eb21499ced64b320d217bce0f71e6ed19bb660255da1553667a5a56e2b
MD5 e93786b481cd002da8ddf9fc1d42fa34
BLAKE2b-256 17a76f2363c02fa910a3d9f5414a87c8cd415be32bd2d84a8c08bdd840c33c68

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page