A comprehensive framework for systematic A/B testing, optimization, and performance analytics of LLM prompts across multiple providers
Project description
Prompt Optimizer
A comprehensive framework for systematic A/B testing, optimization, and performance analytics of LLM prompts across multiple providers (OpenAI, Anthropic, Google, HuggingFace, local models).
Author
Sherin Joseph Roy
- Email: sherin.joseph2217@gmail.com
- GitHub: @Sherin-SEF-AI
- LinkedIn: @sherin-roy-deepmost
Features
- Multi-Variant A/B Testing: Statistical rigor with early stopping and significance testing
- Prompt Version Control: Git-like branching and merging for prompt management
- Performance Analytics: Quality scoring, cost tracking, and comprehensive reporting
- Automated Optimization: Genetic algorithms and RLHF for prompt improvement
- Multi-Provider Support: OpenAI, Anthropic, Google, HuggingFace, local models
- Data Management: SQLAlchemy ORM, Redis caching, and efficient storage
- Visualization Dashboards: Interactive charts and real-time monitoring
- RESTful API: FastAPI-based server with comprehensive endpoints
- CLI Tools: Command-line interface for experiment management
- Framework Integrations: Easy integration with popular ML frameworks
Installation
pip install prompt-optimizer
Or install from source:
git clone https://github.com/Sherin-SEF-AI/prompt-optimizer.git
cd prompt-optimizer
pip install -e .
Quick Start
Basic Usage
from prompt_optimizer import PromptOptimizer
from prompt_optimizer.types import OptimizerConfig, ExperimentConfig, ProviderType
# Initialize the optimizer
config = OptimizerConfig(
database_url="sqlite:///prompt_optimizer.db",
default_provider=ProviderType.OPENAI,
api_keys={"openai": "your-api-key"}
)
optimizer = PromptOptimizer(config)
# Create an A/B test experiment
experiment_config = ExperimentConfig(
name="email_subject_test",
traffic_split={"control": 0.5, "variant": 0.5},
provider=ProviderType.OPENAI,
model="gpt-3.5-turbo"
)
experiment = optimizer.create_experiment(
name="Email Subject Line Test",
description="Testing different email subject line prompts",
variants=[
"Write an engaging subject line for: {topic}",
"Create a compelling email subject about: {topic}"
],
config=experiment_config
)
# Test prompts
result = await optimizer.test_prompt(
experiment_id=experiment.id,
user_id="user123",
input_data={"topic": "AI in healthcare"}
)
# Analyze results
analysis = optimizer.analyze_experiment(experiment.id)
print(f"Best variant: {analysis.best_variant}")
print(f"Confidence: {analysis.confidence_level:.2%}")
CLI Usage
# List experiments
prompt-optimizer list-experiments
# Create experiment
prompt-optimizer create-experiment --name "Test" --variants "prompt1" "prompt2"
# Run analysis
prompt-optimizer analyze --experiment-id exp_123
# Optimize prompt
prompt-optimizer optimize --prompt "Your prompt here"
API Usage
Start the server:
uvicorn prompt_optimizer.api.server:app --reload
Access the API at http://localhost:8000 and interactive docs at http://localhost:8000/docs.
Architecture
prompt-optimizer/
├── core/ # Core optimization engine
├── testing/ # A/B testing framework
├── providers/ # LLM provider integrations
├── analytics/ # Performance analytics
├── optimization/ # Genetic algorithms, RLHF
├── storage/ # Database and caching
├── api/ # FastAPI server
├── cli/ # Command-line interface
├── visualization/ # Dashboards and charts
└── types.py # Type definitions
Configuration
Environment Variables
export PROMPT_OPTIMIZER_DATABASE_URL="postgresql://user:pass@localhost/prompt_opt"
export PROMPT_OPTIMIZER_REDIS_URL="redis://localhost:6379"
export OPENAI_API_KEY="your-openai-key"
export ANTHROPIC_API_KEY="your-anthropic-key"
export GOOGLE_API_KEY="your-google-key"
Configuration File
Create config.yaml:
database:
url: "sqlite:///prompt_optimizer.db"
pool_size: 10
max_overflow: 20
redis:
url: "redis://localhost:6379"
ttl: 3600
providers:
openai:
api_key: "${OPENAI_API_KEY}"
default_model: "gpt-3.5-turbo"
anthropic:
api_key: "${ANTHROPIC_API_KEY}"
default_model: "claude-3-sonnet-20240229"
optimization:
max_iterations: 50
population_size: 20
mutation_rate: 0.1
crossover_rate: 0.8
testing:
default_significance_level: 0.05
min_sample_size: 100
max_duration_days: 14
Examples
A/B Testing Email Prompts
# Create experiment for email subject lines
experiment = optimizer.create_experiment(
name="Email Subject Optimization",
description="Testing different email subject line prompts",
variants=[
"Subject: {topic} - You won't believe what we found!",
"Subject: Discover the latest in {topic}",
"Subject: {topic} insights that will change everything"
],
config=ExperimentConfig(
traffic_split={"v1": 0.33, "v2": 0.33, "v3": 0.34},
min_sample_size=50,
significance_level=0.05
)
)
# Run tests
for i in range(100):
result = await optimizer.test_prompt(
experiment_id=experiment.id,
user_id=f"user_{i}",
input_data={"topic": "artificial intelligence"}
)
# Analyze results
analysis = optimizer.analyze_experiment(experiment.id)
print(f"Best performing variant: {analysis.best_variant}")
Prompt Optimization
# Optimize a customer service prompt
optimized = await optimizer.optimize_prompt(
base_prompt="Help the customer with their issue",
optimization_config=OptimizationConfig(
max_iterations=30,
target_metrics=[MetricType.QUALITY, MetricType.COST],
constraints={"max_tokens": 100}
)
)
print(f"Original: {optimized.original_prompt}")
print(f"Optimized: {optimized.optimized_prompt}")
print(f"Improvement: {optimized.improvement_score:.2%}")
Quality Scoring
from prompt_optimizer.analytics import QualityScorer
scorer = QualityScorer()
score = await scorer.score_response(
prompt="Explain machine learning",
response="Machine learning is a subset of AI that enables computers to learn from data."
)
print(f"Overall Score: {score.overall_score:.3f}")
print(f"Relevance: {score.relevance:.3f}")
print(f"Coherence: {score.coherence:.3f}")
print(f"Accuracy: {score.accuracy:.3f}")
Testing
Run the test suite:
# Install development dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/
# Run with coverage
pytest --cov=prompt_optimizer tests/
# Run specific test
pytest tests/test_ab_testing.py::test_experiment_creation
Documentation
Contributing
- Fork the repository: https://github.com/Sherin-SEF-AI/prompt-optimizer.git
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Roadmap
- Advanced prompt templates and variables
- Multi-modal prompt optimization
- Real-time streaming analytics
- Enterprise SSO integration
- Advanced cost optimization algorithms
- Prompt security and safety checks
- Integration with popular ML platforms
- Mobile app for experiment monitoring
Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: sherin.joseph2217@gmail.com
- LinkedIn: Sherin Joseph Roy
Acknowledgments
- OpenAI, Anthropic, Google, and HuggingFace for their LLM APIs
- The open-source community for the excellent libraries used in this project
- All contributors and users of this framework
Made with ❤️ by Sherin Joseph Roy
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_prompt_optimizer-0.1.0.tar.gz.
File metadata
- Download URL: llm_prompt_optimizer-0.1.0.tar.gz
- Upload date:
- Size: 48.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f08949996e69ce1640329287732ed78207c3dce1364d6ae56aa06345488fe33e
|
|
| MD5 |
d7f729b2340112d5660e4c87c3a5ccf7
|
|
| BLAKE2b-256 |
20728db43395b29dc8908a929cb589e8f92d01106ca9d9254fb7179f9287b7d1
|
File details
Details for the file llm_prompt_optimizer-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llm_prompt_optimizer-0.1.0-py3-none-any.whl
- Upload date:
- Size: 62.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c8039c0d8488aee2e2849a5072686a2a423019b9b208c6bc062f40c208ada14a
|
|
| MD5 |
0564e5677589f432543bc61d194688d6
|
|
| BLAKE2b-256 |
16cd038923ea30a4f9e28ace7d3b3515f41796f2cfbb957dc655321767cdb5f4
|