Type-safe prompt management with automatic optimization for LLMs

These details have not been verified by PyPI

Project links

Project description

FlowPrompt

Stop guessing which prompt works. Measure it.

30-Second Quickstart

Define prompts as Python classes. No API key needed to preview messages:

from flowprompt import Prompt
from pydantic import BaseModel

class ExtractUser(Prompt):
    system = "Extract user info from text."
    user = "Text: {text}"

    class Output(BaseModel):
        name: str
        age: int

# Preview messages -- works without an API key
print(ExtractUser(text="John is 25").to_messages())
# [{'role': 'system', 'content': 'Extract user info from text.'},
#  {'role': 'user', 'content': 'Text: John is 25'}]

# Run against any LLM
result = ExtractUser(text="John is 25").run(model="gpt-4o")
print(result.name)  # "John"
print(result.age)   # 25

Compare Prompts in 5 Lines

The killer feature: find which prompt actually works better, with statistical significance.

from flowprompt import Prompt, compare

class Concise(Prompt):
    system = "Be concise."
    user = "Summarize: {text}"

class Detailed(Prompt):
    system = "Be thorough and detailed."
    user = "Provide a comprehensive summary of: {text}"

result = compare(
    {"concise": Concise, "detailed": Detailed},
    inputs=[{"text": "Python is a programming language..."}, ...],
    expected=["Python is a versatile language", ...],
    model="gpt-4o-mini",
)
print(result)
# Comparison Results
# ========================================
#   concise: 90% accuracy, 245ms avg, 50 runs << WINNER
#   detailed: 72% accuracy, 410ms avg, 50 runs
#
#   p=0.0231 (SIGNIFICANT)
#   effect size: -20.00%

Test Prompts in CI

FlowPrompt includes a pytest plugin (auto-discovered, zero config):

# test_prompts.py
import pytest

@pytest.mark.prompt_test
def test_sentiment(fp_compare):
    result = fp_compare(
        {"v1": PromptV1, "v2": PromptV2},
        inputs=[{"text": "I love this!"}],
        expected=["positive"],
        model="gpt-4o-mini",
    )
    result.assert_significant()
    result.assert_winner("v1")
    result.assert_no_errors()

pip install flowprompt-ai[pytest]
pytest --no-slow-prompts  # skip expensive tests

Installation

pip install flowprompt-ai

Note: The package is installed as flowprompt-ai but imported as flowprompt

Optional extras:

pip install flowprompt-ai[all]        # Everything
pip install flowprompt-ai[pytest]     # Pytest fixtures & markers
pip install flowprompt-ai[cli]        # CLI tools
pip install flowprompt-ai[tracing]    # OpenTelemetry support
pip install flowprompt-ai[multimodal] # Images, PDFs, audio, video

A/B Testing

FlowPrompt is the only Python LLM framework with built-in A/B testing.

Quick comparison with compare():

from flowprompt import compare

result = compare(
    {"v1": PromptV1, "v2": PromptV2, "v3": PromptV3},
    inputs=test_data,
    model="gpt-4o-mini",
    confidence_level=0.95,
)

if result.winner:
    print(f"Winner: {result.winner} (p={result.statistical_result.p_value:.4f})")

Full experiment control when you need production traffic splitting, sticky user assignment, or multi-armed bandits:

from flowprompt.testing import create_simple_experiment

config, runner = create_simple_experiment(
    name="prompt_comparison",
    control_prompt=PromptV1,
    treatment_prompts=[("v2", PromptV2)],
    min_samples=100,
)

runner.start_experiment(config.id)
variant = runner.get_variant(config.id, user_id="user123")
result = runner.run_prompt(config.id, variant.name, input_data={"text": "..."})

summary = runner.get_summary(config.id)
if summary.winner:
    print(f"Winner: {summary.winner.name}")

Six allocation strategies: Random, Round-Robin, Weighted, Epsilon-Greedy, UCB, Thompson Sampling.

Four statistical tests: Z-test, Chi-squared, Welch's t-test, Bayesian.

Structured Outputs

Define expected output as a Pydantic model. Parsing and validation are automatic.

from pydantic import BaseModel, Field

class Sentiment(Prompt):
    system = "Analyze the sentiment of the given text."
    user = "Text: {text}"

    class Output(BaseModel):
        sentiment: str = Field(description="positive, negative, or neutral")
        confidence: float = Field(ge=0.0, le=1.0)

result = Sentiment(text="I love this!").run(model="gpt-4o")
print(result.sentiment)   # "positive"
print(result.confidence)  # 0.95

Models that support native JSON schema get guaranteed valid output. Others fall back to JSON mode with schema hints.

Multi-Provider Support

Switch between 100+ providers with a single parameter.

result = prompt.run(model="gpt-4o")                              # OpenAI
result = prompt.run(model="anthropic/claude-3-5-sonnet-20241022") # Anthropic
result = prompt.run(model="gemini/gemini-2.0-flash-exp")          # Google
result = prompt.run(model="ollama/llama3")                        # Local

More Features

Feature	Example
Caching	`configure_cache(enabled=True, default_ttl=3600)` -- cut costs 50-90%
Optimization	DSPy-style auto-improvement with `flowprompt.optimize`
Streaming	`for chunk in prompt.stream(model="gpt-4o"): ...`
Observability	`get_tracer().get_summary()` -- costs, tokens, latency
YAML prompts	`load_prompt("prompts/my_prompt.yaml")`
Multimodal	Images, PDFs, audio via `flowprompt.multimodal`
CLI	`flowprompt optimize prompt.py examples.json`

Comparison

Feature	FlowPrompt	LangChain	Instructor	DSPy
A/B testing	Built-in	No	No	No
Structured outputs	Yes	Partial	Best-in-class	Yes
Auto-optimization	Yes	No	No	Best-in-class
Multi-provider	Yes	Yes	Yes	Partial
Caching	Yes	Yes	Yes	Yes
Cost tracking	Yes	Partial	No	No
Streaming	Yes	Yes	Yes	Yes
Import time	<100ms	~2s	<100ms	~6s

Documentation

Quick Start Guide -- Get started in 5 minutes
A/B Testing Guide -- Run experiments
API Reference -- Complete API documentation
Optimization Guide -- Improve prompts automatically
Examples -- Runnable example scripts

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

git clone https://github.com/yotambraun/flowprompt.git
cd flowprompt
uv venv && uv sync --all-extras
uv run pytest

License

MIT License -- see LICENSE for details.

Made with care by Yotam Braun

GitHub | PyPI | Issues

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.0

Mar 20, 2026

0.2.1

Jan 16, 2026

0.2.0

Jan 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flowprompt_ai-0.3.0.tar.gz (141.8 kB view details)

Uploaded Mar 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

flowprompt_ai-0.3.0-py3-none-any.whl (82.4 kB view details)

Uploaded Mar 20, 2026 Python 3

File details

Details for the file flowprompt_ai-0.3.0.tar.gz.

File metadata

Download URL: flowprompt_ai-0.3.0.tar.gz
Upload date: Mar 20, 2026
Size: 141.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for flowprompt_ai-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`f3948ddd3426e8f1f1bf7745822c37ecd3add18218c7f9966b9b6a0c6b5320fe`
MD5	`9b46c8e3c47b1a71137ebb911ffb39c7`
BLAKE2b-256	`df69b8dc0724d44cf0acf93053d6630fb93e93c7443c75ec72ff9afa02df1408`

See more details on using hashes here.

File details

Details for the file flowprompt_ai-0.3.0-py3-none-any.whl.

File metadata

Download URL: flowprompt_ai-0.3.0-py3-none-any.whl
Upload date: Mar 20, 2026
Size: 82.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for flowprompt_ai-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b6d516eb21499ced64b320d217bce0f71e6ed19bb660255da1553667a5a56e2b`
MD5	`e93786b481cd002da8ddf9fc1d42fa34`
BLAKE2b-256	`17a76f2363c02fa910a3d9f5414a87c8cd415be32bd2d84a8c08bdd840c33c68`

See more details on using hashes here.

flowprompt-ai 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

FlowPrompt

30-Second Quickstart

Compare Prompts in 5 Lines

Test Prompts in CI

Installation

A/B Testing

Structured Outputs

Multi-Provider Support

More Features

Comparison

Documentation

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes