Skip to main content

The Swiss Army knife for prompt engineering — build, test, version, and optimize prompts 10x faster.

Project description

PromptForge

The Swiss Army knife for prompt engineering -- build, test, version, and optimize prompts 10x faster.

MIT License Python 3.10+ Pydantic v2 Code Style: Black PyPI version


Why PromptForge?

Writing a good prompt is half the battle. Managing hundreds of prompts across projects, tracking which version performs best, estimating token costs before you hit "send" -- that is the other half, and it is where most teams waste time.

PromptForge gives you a single toolkit to:

  • Build parameterized prompt templates with {{variable}} injection
  • Test prompts with built-in A/B testing and batch evaluation
  • Version every change with full diff and fork support
  • Score output quality on coherence, relevance, safety, completeness, and clarity
  • Compress prompts to use fewer tokens without losing meaning
  • Chain prompts into multi-step pipelines with automatic output passing
  • Count tokens and estimate costs across 16+ LLM models
  • Export to JSON, YAML, or LangChain format

All backed by a file-based store (no database required) with support for OpenAI, Anthropic, Google Gemini, and local models (Ollama, LM Studio).


Features

Prompt Templates with Variable Injection

Define templates with {{variable}} placeholders. PromptForge auto-detects variables, validates required ones, and fills defaults.

Full Version History

Every save creates a version snapshot. Diff any two versions, fork from a specific version, and track the evolution of your prompts over time.

A/B Testing Framework

Compare two prompts head-to-head on the same inputs. PromptForge runs both, scores outputs on five quality dimensions, and declares a winner.

Batch Testing

Run a suite of inputs through a single prompt and get back quality scores, latencies, token usage, and estimated cost per run.

Chain-of-Thought Pipelines

Link prompts into multi-step pipelines where each step's output flows into the next as a {{step_N_output}} variable.

Quality Scoring

Deterministic, heuristic-based scoring across five dimensions -- coherence, relevance, safety, completeness, and clarity -- with weighted overall score. No LLM call required.

Prompt Compression

Reduce token count with configurable aggression (0.0 to 1.0) by removing filler phrases, collapsing whitespace, shortening verbose instructions, and compressing lists.

Token Counting and Cost Estimation

Accurate token counts via tiktoken with pricing tables for GPT-4o, GPT-4o-mini, Claude 3.5 Sonnet, Gemini 2.0 Flash, Llama 3.1, Mistral, and more. Compare costs across all models in one call.

Multi-Provider LLM Engine

Unified interface for OpenAI, Anthropic, Google Gemini, and local models (Ollama / LM Studio). Switch providers with a single parameter.

Export to JSON, YAML, and LangChain

Export individual prompts or your entire library. LangChain export generates PromptTemplate-compatible JSON with input_variables, template_format, and metadata.


Installation

# Install from PyPI
pip install ai-prompt-forge

# Or from source
git clone https://github.com/theihtisham/ai-prompt-forge.git
cd ai-prompt-forge
pip install -e .

# With dev dependencies (pytest, coverage)
pip install -e ".[dev]"

Requirements

  • Python 3.10+
  • API keys (set as environment variables):
    • OPENAI_API_KEY for OpenAI
    • ANTHROPIC_API_KEY for Anthropic
    • GOOGLE_API_KEY for Google Gemini
    • Local models need no key (uses http://localhost:11434 by default)

Quick Start

Create and Render a Template

from promptforge.models import PromptTemplate, PromptCategory

template = PromptTemplate(
    name="Code Reviewer",
    description="Reviews code for bugs and style issues",
    category=PromptCategory.CODING,
    template="""Review the following {{language}} code for bugs, style issues,
and performance problems. Be specific and actionable.

Code:
```{{language}}}
{{code}}

Focus on: {{focus_areas}}""", )

Render with variables

rendered = template.render({ "language": "python", "code": "def add(a, b): return a + b", "focus_areas": "edge cases, type safety", }) print(rendered)


### Save, Version, and Fork

```python
from promptforge.store import PromptStore

store = PromptStore()  # defaults to ~/.promptforge/data

# Save (creates version 1)
template = store.save_prompt(template)

# Update the template and save again (creates version 2)
template.template = "You are an expert {{language}} developer. Review:\n{{code}}"
template.version = 2
store.save_prompt(template)

# Diff versions
diff = store.diff_versions(template.id, old_ver=1, new_ver=2)
print(diff.summary)       # "+2 lines, -1 lines"
print(diff.additions)     # list of added lines
print(diff.deletions)     # list of removed lines

# Fork from version 1 into a new prompt
forked = store.fork_version(template.id, version=1, new_name="Simple Reviewer")

List and Search Prompts

# List all prompts
all_prompts = store.list_prompts()

# Filter by category
coding_prompts = store.list_prompts(category="coding")

# Filter by tag
tagged = store.list_prompts(tag="production")

# Full-text search
results = store.list_prompts(search="code review")

Generate with LLM Providers

from promptforge.engine import get_provider

# OpenAI
openai = get_provider("openai", model="gpt-4o")
response = openai.generate("Explain recursion in one paragraph.")
print(response.text)
print(f"Tokens: {response.total_tokens}, Latency: {response.latency_ms:.0f}ms")

# Anthropic
anthropic = get_provider("anthropic", model="claude-3-5-sonnet-20241022")
response = anthropic.generate("Explain recursion in one paragraph.", temperature=0.3)

# Google Gemini
google = get_provider("google", model="gemini-2.0-flash")
response = google.generate("Explain recursion in one paragraph.")

# Local (Ollama)
local = get_provider("local", model="llama3.1")
response = local.generate("Explain recursion in one paragraph.")

A/B Testing

from promptforge.testing import ABTester
from promptforge.models import ABTestConfig, ModelProvider

config = ABTestConfig(
    name="Review style comparison",
    prompt_a_id=prompt_a.id,
    prompt_b_id=prompt_b.id,
    test_inputs=[
        "Write a function to reverse a linked list",
        "Implement binary search in Python",
        "Create a REST API endpoint for user login",
    ],
    provider=ModelProvider.OPENAI,
    model="gpt-4o-mini",
)

tester = ABTester(store=store)
results, summary = tester.run_test(config)

print(f"Wins A: {summary.wins_a}, Wins B: {summary.wins_b}, Ties: {summary.ties}")
print(f"Recommendation: {summary.recommendation}")

Batch Testing

from promptforge.testing import BatchTester
from promptforge.models import BatchTestConfig, ModelProvider

config = BatchTestConfig(
    prompt_id=template.id,
    inputs=[
        "Write a sorting function",
        "Debug this null pointer exception",
        "Explain the difference between TCP and UDP",
    ],
    provider=ModelProvider.OPENAI,
    model="gpt-4o-mini",
)

tester = BatchTester(store=store)
result = tester.run_batch(config)

print(f"Success: {result.successful}/{result.total_inputs}")
print(f"Avg latency: {result.avg_latency_ms:.0f}ms")
print(f"Total tokens: {result.total_tokens_used}")
print(f"Estimated cost: ${result.estimated_cost_usd:.4f}")

Chain Pipelines

from promptforge.chain import ChainBuilder
from promptforge.models import ModelProvider

builder = ChainBuilder(store=store)

# Create a 3-step pipeline
pipeline = builder.create_pipeline(
    name="Research Pipeline",
    description="Research -> Summarize -> Format",
    step_prompt_ids=[research_prompt.id, summarize_prompt.id, format_prompt.id],
)

# Run it
result = builder.run_pipeline(
    pipeline_id=pipeline.id,
    initial_variables={"topic": "quantum computing"},
    provider=ModelProvider.OPENAI,
    model="gpt-4o-mini",
)

print(result.final_output)
print(f"Total tokens: {result.total_tokens_used}")
print(f"Steps completed: {len(result.step_results)}")

Quality Scoring

from promptforge.quality import QualityScorer

scorer = QualityScorer()

scores = scorer.score_all(
    output="To implement binary search, first sort the array...",
    prompt="Explain how binary search works",
)

print(f"Coherence:    {scores['coherence']:.2f}")
print(f"Relevance:    {scores['relevance']:.2f}")
print(f"Safety:       {scores['safety']:.2f}")
print(f"Completeness: {scores['completeness']:.2f}")
print(f"Clarity:      {scores['clarity']:.2f}")
print(f"Overall:      {scores['overall']:.2f}")

Prompt Compression

from promptforge.compression import PromptCompressor

compressor = PromptCompressor()

result = compressor.compress(
    """Please note that it is important to understand that in order to
    implement binary search, you should first make sure to sort the array
    prior to beginning the search algorithm. Keep in mind that the time
    complexity is O(log n).""",
    aggression=0.7,
)

print(result["compressed"])
print(f"Reduction: {result['reduction_pct']}%")
print(f"Steps applied: {', '.join(result['steps_applied'])}")

Token Counting and Cost Estimation

from promptforge.token_counter import count_tokens, estimate_cost, analyze_text, compare_models

# Count tokens
tokens = count_tokens("Explain quantum computing in detail.", model="gpt-4o")
print(f"Token count: {tokens}")

# Estimate cost
cost = estimate_cost(input_tokens=500, output_tokens=200, model="gpt-4o")
print(f"Estimated cost: ${cost:.6f}")

# Full analysis
analysis = analyze_text("Your long prompt here...", model="gpt-4o", estimated_output_tokens=500)
print(f"Input tokens: {analysis['input_tokens']}")
print(f"Context usage: {analysis['context_usage_pct']}%")
print(f"Total cost: ${analysis['total_cost_usd']:.6f}")

# Compare across all models
comparisons = compare_models("Your prompt text", estimated_output_tokens=500)
for c in comparisons:
    print(f"{c['model']:25s}  {c['total_tokens']:6d} tokens  ${c['total_cost_usd']:.6f}")

Export

from promptforge.exporter import PromptExporter
from promptforge.models import ExportFormat

exporter = PromptExporter(store=store)

# Export a single prompt
json_str = exporter.export_prompt(template, fmt=ExportFormat.JSON)
yaml_str = exporter.export_prompt(template, fmt=ExportFormat.YAML)
langchain_str = exporter.export_prompt(template, fmt=ExportFormat.LANGCHAIN)

# Export all prompts
all_json = exporter.export_all(fmt=ExportFormat.JSON)

# Export specific prompts by ID
selected = exporter.export_by_ids(["abc123", "def456"], fmt=ExportFormat.YAML)

Architecture

promptforge/
├── models.py          # Pydantic data models (PromptTemplate, ABTestConfig, etc.)
├── store.py           # File-based persistence layer (~/.promptforge/data/)
├── engine.py          # LLM provider integrations (OpenAI, Anthropic, Google, Local)
├── quality.py         # Output quality scorer (coherence, relevance, safety, clarity)
├── testing.py         # A/B testing and batch testing frameworks
├── chain.py           # Chain-of-thought pipeline builder
├── compression.py     # Prompt compression engine
├── token_counter.py   # Token counting and cost estimation (tiktoken)
├── exporter.py        # Export to JSON, YAML, and LangChain format
└── __init__.py        # Package metadata

Data Flow

PromptTemplate  ──render()──>  Rendered Prompt
       │                            │
       │                            ├──> engine.generate()  ──> LLMResponse
       │                            ├──> quality.score_all()──> QualityScore
       │                            ├──> compression.compress()
       │                            └──> token_counter.analyze_text()
       │
       ├──> store.save_prompt()    (creates version snapshot)
       ├──> store.diff_versions()  (compare two versions)
       ├──> store.fork_version()   (branch from a version)
       └──> exporter.export_prompt() (JSON / YAML / LangChain)

Storage Layout

All data is stored under ~/.promptforge/data/ (configurable via PROMPTFORGE_DATA_DIR):

~/.promptforge/data/
├── prompts/              # Prompt templates (one JSON file per prompt)
├── versions/             # Version snapshots ({id}_v{N}.json)
├── tests/                # A/B test configs and batch results
└── chains/               # Chain pipelines and run results

API Reference

promptforge.models

Class Description
PromptTemplate Core template model with render() and extract_variables()
VariableDefinition Variable schema with name, default, required, and example
PromptCategory Enum: coding, writing, analysis, creative, business, education, etc.
ModelProvider Enum: openai, anthropic, google, local
ExportFormat Enum: json, yaml, langchain
QualityMetric Enum: coherence, relevance, safety, completeness, clarity
VersionDiff Diff result between two prompt versions
ABTestConfig A/B test configuration
ABTestResult Single A/B test run result
ABTestSummary Aggregated A/B test summary with recommendation
BatchTestConfig Batch test configuration
BatchTestResult Batch test result with cost and latency stats
ChainPipeline Chain-of-thought pipeline definition
ChainStep Single step in a chain pipeline
ChainRunResult Result of running a chain pipeline
TokenCount Token count result for a specific model
QualityScore Quality scores with weighted overall

promptforge.store.PromptStore

Method Description
save_prompt(prompt) Save a template (creates version snapshot)
get_prompt(prompt_id) Retrieve a template by ID
list_prompts(category, tag, search) List prompts with optional filters
delete_prompt(prompt_id) Delete a template and all versions
get_version(prompt_id, version) Get a specific version
list_versions(prompt_id) List all version numbers
diff_versions(prompt_id, old_ver, new_ver) Diff two versions
fork_version(prompt_id, version, new_name) Fork from a specific version
save_chain(chain) Save a chain pipeline
get_chain(chain_id) Retrieve a chain pipeline
list_chains() List all chain pipelines

promptforge.engine

Function / Class Description
get_provider(provider, model, api_key) Factory function to create a provider
OpenAIProvider OpenAI API integration
AnthropicProvider Anthropic API integration
GoogleProvider Google Gemini API integration
LocalProvider Local model integration (Ollama, LM Studio)
LLMResponse Standardized response with text, tokens, latency

promptforge.quality.QualityScorer

Method Description
score_coherence(text) Score internal consistency and structure (0-1)
score_relevance(output, prompt) Score relevance to the input prompt (0-1)
score_safety(text) Score safety (penalizes harmful content) (0-1)
score_completeness(output, prompt) Score how completely the prompt is addressed (0-1)
score_clarity(text) Score readability and understandability (0-1)
score_all(output, prompt) All five scores plus weighted overall

promptforge.testing

Class Method Description
ABTester run_test(config, variables) Run A/B test, returns results and summary
BatchTester run_batch(config, variables) Run batch test, returns result with stats

promptforge.chain.ChainBuilder

Method Description
create_pipeline(name, step_prompt_ids) Create a new pipeline with ordered steps
add_step(pipeline_id, prompt_id) Add a step to an existing pipeline
run_pipeline(pipeline_id, initial_variables) Execute pipeline sequentially
list_pipelines() List all saved pipelines

promptforge.compression.PromptCompressor

Method Description
compress(text, aggression) Compress prompt (aggression 0.0-1.0)

promptforge.token_counter

Function Description
count_tokens(text, model) Count tokens for a given model
estimate_cost(input_tokens, output_tokens, model) Estimate cost in USD
analyze_text(text, model, estimated_output_tokens) Full token and cost analysis
compare_models(text, estimated_output_tokens) Compare cost across all models
get_supported_models() List all supported model names

promptforge.exporter.PromptExporter

Method Description
export_prompt(prompt, fmt) Export a single prompt (json/yaml/langchain)
export_all(fmt) Export all prompts
export_by_ids(prompt_ids, fmt) Export selected prompts by ID

Supported Models

Provider Models
OpenAI gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-4-32k, gpt-3.5-turbo
Anthropic claude-3.5-sonnet, claude-3-opus, claude-3-haiku, claude-3.5-haiku
Google gemini-1.5-pro, gemini-1.5-flash, gemini-2.0-flash
Local llama-3.1-70b, llama-3.1-8b, mistral-large, mixtral-8x7b

Development

# Clone the repository
git clone https://github.com/theihtisham/ai-prompt-forge.git
cd ai-prompt-forge

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run tests with coverage
pytest --cov=promptforge --cov-report=term-missing

Project Structure

12-prompt-forge/
├── pyproject.toml           # Build config, dependencies, and metadata
├── requirements.txt         # Pinned dependencies
├── LICENSE                  # MIT License
├── .gitignore
└── src/
    └── promptforge/
        ├── __init__.py      # Package metadata (v1.0.0)
        ├── models.py        # Pydantic data models and enums
        ├── store.py         # File-based storage layer
        ├── engine.py        # LLM provider integrations
        ├── quality.py       # Output quality scorer
        ├── testing.py       # A/B and batch testing
        ├── chain.py         # Chain-of-thought pipelines
        ├── compression.py   # Prompt compression engine
        ├── token_counter.py # Token counting and cost estimation
        └── exporter.py      # Multi-format export

Contributing

Contributions are welcome. To contribute:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/my-feature)
  3. Write tests for your changes
  4. Ensure all tests pass (pytest)
  5. Commit with a descriptive message
  6. Open a pull request

Guidelines

  • Follow the existing code style (type hints on all public APIs, Pydantic models for data)
  • Add tests for new functionality
  • Keep the public API backward-compatible
  • Document new parameters and return types

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_prompt_forge-1.0.0.tar.gz (28.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_prompt_forge-1.0.0-py3-none-any.whl (28.4 kB view details)

Uploaded Python 3

File details

Details for the file ai_prompt_forge-1.0.0.tar.gz.

File metadata

  • Download URL: ai_prompt_forge-1.0.0.tar.gz
  • Upload date:
  • Size: 28.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for ai_prompt_forge-1.0.0.tar.gz
Algorithm Hash digest
SHA256 aef9cdb7c74b00d83b3ef4ddda8f7bd7473d89509e953663dcf5924b5de829ff
MD5 3b19c71db6f3d0e86e5d4d69b3ee47cb
BLAKE2b-256 41436af95a83aea87cd76eec7004fb0822a3f9f571c108447bca8778062eb60b

See more details on using hashes here.

File details

Details for the file ai_prompt_forge-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ai_prompt_forge-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 769ad1bc4978f92eb72ed8e8bda28129adb2332148fe817387fafbd8af9ae670
MD5 74766d5cc9cb5836b70089d036b959a2
BLAKE2b-256 43d6b740fd6f7cc8346e534ac98cd70163b0742a4ae5b3120cc23a4732676ef8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page