The Swiss Army knife for prompt engineering — build, test, version, and optimize prompts 10x faster.

These details have not been verified by PyPI

Project links

Project description

PromptForge

The Swiss Army knife for prompt engineering -- build, test, version, and optimize prompts 10x faster.

Why PromptForge?

Writing a good prompt is half the battle. Managing hundreds of prompts across projects, tracking which version performs best, estimating token costs before you hit "send" -- that is the other half, and it is where most teams waste time.

PromptForge gives you a single toolkit to:

Build parameterized prompt templates with {{variable}} injection
Test prompts with built-in A/B testing and batch evaluation
Version every change with full diff and fork support
Score output quality on coherence, relevance, safety, completeness, and clarity
Compress prompts to use fewer tokens without losing meaning
Chain prompts into multi-step pipelines with automatic output passing
Count tokens and estimate costs across 16+ LLM models
Export to JSON, YAML, or LangChain format

All backed by a file-based store (no database required) with support for OpenAI, Anthropic, Google Gemini, and local models (Ollama, LM Studio).

Features

Prompt Templates with Variable Injection

Define templates with {{variable}} placeholders. PromptForge auto-detects variables, validates required ones, and fills defaults.

Full Version History

Every save creates a version snapshot. Diff any two versions, fork from a specific version, and track the evolution of your prompts over time.

A/B Testing Framework

Compare two prompts head-to-head on the same inputs. PromptForge runs both, scores outputs on five quality dimensions, and declares a winner.

Batch Testing

Run a suite of inputs through a single prompt and get back quality scores, latencies, token usage, and estimated cost per run.

Chain-of-Thought Pipelines

Link prompts into multi-step pipelines where each step's output flows into the next as a {{step_N_output}} variable.

Quality Scoring

Deterministic, heuristic-based scoring across five dimensions -- coherence, relevance, safety, completeness, and clarity -- with weighted overall score. No LLM call required.

Prompt Compression

Reduce token count with configurable aggression (0.0 to 1.0) by removing filler phrases, collapsing whitespace, shortening verbose instructions, and compressing lists.

Token Counting and Cost Estimation

Accurate token counts via tiktoken with pricing tables for GPT-4o, GPT-4o-mini, Claude 3.5 Sonnet, Gemini 2.0 Flash, Llama 3.1, Mistral, and more. Compare costs across all models in one call.

Multi-Provider LLM Engine

Unified interface for OpenAI, Anthropic, Google Gemini, and local models (Ollama / LM Studio). Switch providers with a single parameter.

Export to JSON, YAML, and LangChain

Export individual prompts or your entire library. LangChain export generates PromptTemplate-compatible JSON with input_variables, template_format, and metadata.

Installation

# Install from PyPI
pip install ai-prompt-forge

# Or from source
git clone https://github.com/theihtisham/ai-prompt-forge.git
cd ai-prompt-forge
pip install -e .

# With dev dependencies (pytest, coverage)
pip install -e ".[dev]"

Requirements

Python 3.10+
API keys (set as environment variables):
- OPENAI_API_KEY for OpenAI
- ANTHROPIC_API_KEY for Anthropic
- GOOGLE_API_KEY for Google Gemini
- Local models need no key (uses http://localhost:11434 by default)

Quick Start

Create and Render a Template

from promptforge.models import PromptTemplate, PromptCategory

template = PromptTemplate(
    name="Code Reviewer",
    description="Reviews code for bugs and style issues",
    category=PromptCategory.CODING,
    template="""Review the following {{language}} code for bugs, style issues,
and performance problems. Be specific and actionable.

Code:
```{{language}}}
{{code}}

Focus on: {{focus_areas}}""", )

Render with variables

rendered = template.render({ "language": "python", "code": "def add(a, b): return a + b", "focus_areas": "edge cases, type safety", }) print(rendered)


### Save, Version, and Fork

```python
from promptforge.store import PromptStore

store = PromptStore()  # defaults to ~/.promptforge/data

# Save (creates version 1)
template = store.save_prompt(template)

# Update the template and save again (creates version 2)
template.template = "You are an expert {{language}} developer. Review:\n{{code}}"
template.version = 2
store.save_prompt(template)

# Diff versions
diff = store.diff_versions(template.id, old_ver=1, new_ver=2)
print(diff.summary)       # "+2 lines, -1 lines"
print(diff.additions)     # list of added lines
print(diff.deletions)     # list of removed lines

# Fork from version 1 into a new prompt
forked = store.fork_version(template.id, version=1, new_name="Simple Reviewer")

List and Search Prompts

# List all prompts
all_prompts = store.list_prompts()

# Filter by category
coding_prompts = store.list_prompts(category="coding")

# Filter by tag
tagged = store.list_prompts(tag="production")

# Full-text search
results = store.list_prompts(search="code review")

Generate with LLM Providers

from promptforge.engine import get_provider

# OpenAI
openai = get_provider("openai", model="gpt-4o")
response = openai.generate("Explain recursion in one paragraph.")
print(response.text)
print(f"Tokens: {response.total_tokens}, Latency: {response.latency_ms:.0f}ms")

# Anthropic
anthropic = get_provider("anthropic", model="claude-3-5-sonnet-20241022")
response = anthropic.generate("Explain recursion in one paragraph.", temperature=0.3)

# Google Gemini
google = get_provider("google", model="gemini-2.0-flash")
response = google.generate("Explain recursion in one paragraph.")

# Local (Ollama)
local = get_provider("local", model="llama3.1")
response = local.generate("Explain recursion in one paragraph.")

A/B Testing

from promptforge.testing import ABTester
from promptforge.models import ABTestConfig, ModelProvider

config = ABTestConfig(
    name="Review style comparison",
    prompt_a_id=prompt_a.id,
    prompt_b_id=prompt_b.id,
    test_inputs=[
        "Write a function to reverse a linked list",
        "Implement binary search in Python",
        "Create a REST API endpoint for user login",
    ],
    provider=ModelProvider.OPENAI,
    model="gpt-4o-mini",
)

tester = ABTester(store=store)
results, summary = tester.run_test(config)

print(f"Wins A: {summary.wins_a}, Wins B: {summary.wins_b}, Ties: {summary.ties}")
print(f"Recommendation: {summary.recommendation}")

Batch Testing

from promptforge.testing import BatchTester
from promptforge.models import BatchTestConfig, ModelProvider

config = BatchTestConfig(
    prompt_id=template.id,
    inputs=[
        "Write a sorting function",
        "Debug this null pointer exception",
        "Explain the difference between TCP and UDP",
    ],
    provider=ModelProvider.OPENAI,
    model="gpt-4o-mini",
)

tester = BatchTester(store=store)
result = tester.run_batch(config)

print(f"Success: {result.successful}/{result.total_inputs}")
print(f"Avg latency: {result.avg_latency_ms:.0f}ms")
print(f"Total tokens: {result.total_tokens_used}")
print(f"Estimated cost: ${result.estimated_cost_usd:.4f}")

Chain Pipelines

from promptforge.chain import ChainBuilder
from promptforge.models import ModelProvider

builder = ChainBuilder(store=store)

# Create a 3-step pipeline
pipeline = builder.create_pipeline(
    name="Research Pipeline",
    description="Research -> Summarize -> Format",
    step_prompt_ids=[research_prompt.id, summarize_prompt.id, format_prompt.id],
)

# Run it
result = builder.run_pipeline(
    pipeline_id=pipeline.id,
    initial_variables={"topic": "quantum computing"},
    provider=ModelProvider.OPENAI,
    model="gpt-4o-mini",
)

print(result.final_output)
print(f"Total tokens: {result.total_tokens_used}")
print(f"Steps completed: {len(result.step_results)}")

Quality Scoring

from promptforge.quality import QualityScorer

scorer = QualityScorer()

scores = scorer.score_all(
    output="To implement binary search, first sort the array...",
    prompt="Explain how binary search works",
)

print(f"Coherence:    {scores['coherence']:.2f}")
print(f"Relevance:    {scores['relevance']:.2f}")
print(f"Safety:       {scores['safety']:.2f}")
print(f"Completeness: {scores['completeness']:.2f}")
print(f"Clarity:      {scores['clarity']:.2f}")
print(f"Overall:      {scores['overall']:.2f}")

Prompt Compression

from promptforge.compression import PromptCompressor

compressor = PromptCompressor()

result = compressor.compress(
    """Please note that it is important to understand that in order to
    implement binary search, you should first make sure to sort the array
    prior to beginning the search algorithm. Keep in mind that the time
    complexity is O(log n).""",
    aggression=0.7,
)

print(result["compressed"])
print(f"Reduction: {result['reduction_pct']}%")
print(f"Steps applied: {', '.join(result['steps_applied'])}")

Token Counting and Cost Estimation

from promptforge.token_counter import count_tokens, estimate_cost, analyze_text, compare_models

# Count tokens
tokens = count_tokens("Explain quantum computing in detail.", model="gpt-4o")
print(f"Token count: {tokens}")

# Estimate cost
cost = estimate_cost(input_tokens=500, output_tokens=200, model="gpt-4o")
print(f"Estimated cost: ${cost:.6f}")

# Full analysis
analysis = analyze_text("Your long prompt here...", model="gpt-4o", estimated_output_tokens=500)
print(f"Input tokens: {analysis['input_tokens']}")
print(f"Context usage: {analysis['context_usage_pct']}%")
print(f"Total cost: ${analysis['total_cost_usd']:.6f}")

# Compare across all models
comparisons = compare_models("Your prompt text", estimated_output_tokens=500)
for c in comparisons:
    print(f"{c['model']:25s}  {c['total_tokens']:6d} tokens  ${c['total_cost_usd']:.6f}")

Export

from promptforge.exporter import PromptExporter
from promptforge.models import ExportFormat

exporter = PromptExporter(store=store)

# Export a single prompt
json_str = exporter.export_prompt(template, fmt=ExportFormat.JSON)
yaml_str = exporter.export_prompt(template, fmt=ExportFormat.YAML)
langchain_str = exporter.export_prompt(template, fmt=ExportFormat.LANGCHAIN)

# Export all prompts
all_json = exporter.export_all(fmt=ExportFormat.JSON)

# Export specific prompts by ID
selected = exporter.export_by_ids(["abc123", "def456"], fmt=ExportFormat.YAML)

Architecture

promptforge/
├── models.py          # Pydantic data models (PromptTemplate, ABTestConfig, etc.)
├── store.py           # File-based persistence layer (~/.promptforge/data/)
├── engine.py          # LLM provider integrations (OpenAI, Anthropic, Google, Local)
├── quality.py         # Output quality scorer (coherence, relevance, safety, clarity)
├── testing.py         # A/B testing and batch testing frameworks
├── chain.py           # Chain-of-thought pipeline builder
├── compression.py     # Prompt compression engine
├── token_counter.py   # Token counting and cost estimation (tiktoken)
├── exporter.py        # Export to JSON, YAML, and LangChain format
└── __init__.py        # Package metadata

Data Flow

PromptTemplate  ──render()──>  Rendered Prompt
       │                            │
       │                            ├──> engine.generate()  ──> LLMResponse
       │                            ├──> quality.score_all()──> QualityScore
       │                            ├──> compression.compress()
       │                            └──> token_counter.analyze_text()
       │
       ├──> store.save_prompt()    (creates version snapshot)
       ├──> store.diff_versions()  (compare two versions)
       ├──> store.fork_version()   (branch from a version)
       └──> exporter.export_prompt() (JSON / YAML / LangChain)

Storage Layout

All data is stored under ~/.promptforge/data/ (configurable via PROMPTFORGE_DATA_DIR):

~/.promptforge/data/
├── prompts/              # Prompt templates (one JSON file per prompt)
├── versions/             # Version snapshots ({id}_v{N}.json)
├── tests/                # A/B test configs and batch results
└── chains/               # Chain pipelines and run results

API Reference

`promptforge.models`

Class	Description
`PromptTemplate`	Core template model with `render()` and `extract_variables()`
`VariableDefinition`	Variable schema with name, default, required, and example
`PromptCategory`	Enum: coding, writing, analysis, creative, business, education, etc.
`ModelProvider`	Enum: openai, anthropic, google, local
`ExportFormat`	Enum: json, yaml, langchain
`QualityMetric`	Enum: coherence, relevance, safety, completeness, clarity
`VersionDiff`	Diff result between two prompt versions
`ABTestConfig`	A/B test configuration
`ABTestResult`	Single A/B test run result
`ABTestSummary`	Aggregated A/B test summary with recommendation
`BatchTestConfig`	Batch test configuration
`BatchTestResult`	Batch test result with cost and latency stats
`ChainPipeline`	Chain-of-thought pipeline definition
`ChainStep`	Single step in a chain pipeline
`ChainRunResult`	Result of running a chain pipeline
`TokenCount`	Token count result for a specific model
`QualityScore`	Quality scores with weighted overall

`promptforge.store.PromptStore`

Method	Description
`save_prompt(prompt)`	Save a template (creates version snapshot)
`get_prompt(prompt_id)`	Retrieve a template by ID
`list_prompts(category, tag, search)`	List prompts with optional filters
`delete_prompt(prompt_id)`	Delete a template and all versions
`get_version(prompt_id, version)`	Get a specific version
`list_versions(prompt_id)`	List all version numbers
`diff_versions(prompt_id, old_ver, new_ver)`	Diff two versions
`fork_version(prompt_id, version, new_name)`	Fork from a specific version
`save_chain(chain)`	Save a chain pipeline
`get_chain(chain_id)`	Retrieve a chain pipeline
`list_chains()`	List all chain pipelines

`promptforge.engine`

Function / Class	Description
`get_provider(provider, model, api_key)`	Factory function to create a provider
`OpenAIProvider`	OpenAI API integration
`AnthropicProvider`	Anthropic API integration
`GoogleProvider`	Google Gemini API integration
`LocalProvider`	Local model integration (Ollama, LM Studio)
`LLMResponse`	Standardized response with text, tokens, latency

`promptforge.quality.QualityScorer`

Method	Description
`score_coherence(text)`	Score internal consistency and structure (0-1)
`score_relevance(output, prompt)`	Score relevance to the input prompt (0-1)
`score_safety(text)`	Score safety (penalizes harmful content) (0-1)
`score_completeness(output, prompt)`	Score how completely the prompt is addressed (0-1)
`score_clarity(text)`	Score readability and understandability (0-1)
`score_all(output, prompt)`	All five scores plus weighted overall

`promptforge.testing`

Class	Method	Description
`ABTester`	`run_test(config, variables)`	Run A/B test, returns results and summary
`BatchTester`	`run_batch(config, variables)`	Run batch test, returns result with stats

`promptforge.chain.ChainBuilder`

Method	Description
`create_pipeline(name, step_prompt_ids)`	Create a new pipeline with ordered steps
`add_step(pipeline_id, prompt_id)`	Add a step to an existing pipeline
`run_pipeline(pipeline_id, initial_variables)`	Execute pipeline sequentially
`list_pipelines()`	List all saved pipelines

`promptforge.compression.PromptCompressor`

Method	Description
`compress(text, aggression)`	Compress prompt (aggression 0.0-1.0)

`promptforge.token_counter`

Function	Description
`count_tokens(text, model)`	Count tokens for a given model
`estimate_cost(input_tokens, output_tokens, model)`	Estimate cost in USD
`analyze_text(text, model, estimated_output_tokens)`	Full token and cost analysis
`compare_models(text, estimated_output_tokens)`	Compare cost across all models
`get_supported_models()`	List all supported model names

`promptforge.exporter.PromptExporter`

Method	Description
`export_prompt(prompt, fmt)`	Export a single prompt (json/yaml/langchain)
`export_all(fmt)`	Export all prompts
`export_by_ids(prompt_ids, fmt)`	Export selected prompts by ID

Supported Models

Provider	Models
OpenAI	gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-4-32k, gpt-3.5-turbo
Anthropic	claude-3.5-sonnet, claude-3-opus, claude-3-haiku, claude-3.5-haiku
Google	gemini-1.5-pro, gemini-1.5-flash, gemini-2.0-flash
Local	llama-3.1-70b, llama-3.1-8b, mistral-large, mixtral-8x7b

Development

# Clone the repository
git clone https://github.com/theihtisham/ai-prompt-forge.git
cd ai-prompt-forge

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run tests with coverage
pytest --cov=promptforge --cov-report=term-missing

Project Structure

12-prompt-forge/
├── pyproject.toml           # Build config, dependencies, and metadata
├── requirements.txt         # Pinned dependencies
├── LICENSE                  # MIT License
├── .gitignore
└── src/
    └── promptforge/
        ├── __init__.py      # Package metadata (v1.0.0)
        ├── models.py        # Pydantic data models and enums
        ├── store.py         # File-based storage layer
        ├── engine.py        # LLM provider integrations
        ├── quality.py       # Output quality scorer
        ├── testing.py       # A/B and batch testing
        ├── chain.py         # Chain-of-thought pipelines
        ├── compression.py   # Prompt compression engine
        ├── token_counter.py # Token counting and cost estimation
        └── exporter.py      # Multi-format export

Contributing

Contributions are welcome. To contribute:

Fork the repository
Create a feature branch (git checkout -b feature/my-feature)
Write tests for your changes
Ensure all tests pass (pytest)
Commit with a descriptive message
Open a pull request

Guidelines

Follow the existing code style (type hints on all public APIs, Pydantic models for data)
Add tests for new functionality
Keep the public API backward-compatible
Document new parameters and return types

License

This project is licensed under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Apr 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_prompt_forge-1.0.0.tar.gz (28.6 kB view details)

Uploaded Apr 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ai_prompt_forge-1.0.0-py3-none-any.whl (28.4 kB view details)

Uploaded Apr 10, 2026 Python 3

File details

Details for the file ai_prompt_forge-1.0.0.tar.gz.

File metadata

Download URL: ai_prompt_forge-1.0.0.tar.gz
Upload date: Apr 10, 2026
Size: 28.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for ai_prompt_forge-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`aef9cdb7c74b00d83b3ef4ddda8f7bd7473d89509e953663dcf5924b5de829ff`
MD5	`3b19c71db6f3d0e86e5d4d69b3ee47cb`
BLAKE2b-256	`41436af95a83aea87cd76eec7004fb0822a3f9f571c108447bca8778062eb60b`

See more details on using hashes here.

File details

Details for the file ai_prompt_forge-1.0.0-py3-none-any.whl.

File metadata

Download URL: ai_prompt_forge-1.0.0-py3-none-any.whl
Upload date: Apr 10, 2026
Size: 28.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for ai_prompt_forge-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`769ad1bc4978f92eb72ed8e8bda28129adb2332148fe817387fafbd8af9ae670`
MD5	`74766d5cc9cb5836b70089d036b959a2`
BLAKE2b-256	`43d6b740fd6f7cc8346e534ac98cd70163b0742a4ae5b3120cc23a4732676ef8`

See more details on using hashes here.

ai-prompt-forge 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PromptForge

Why PromptForge?

Features

Prompt Templates with Variable Injection

Full Version History

A/B Testing Framework

Batch Testing

Chain-of-Thought Pipelines

Quality Scoring

Prompt Compression

Token Counting and Cost Estimation

Multi-Provider LLM Engine

Export to JSON, YAML, and LangChain

Installation

Requirements

Quick Start

Create and Render a Template

Render with variables

List and Search Prompts

Generate with LLM Providers

A/B Testing

Batch Testing

Chain Pipelines

Quality Scoring

Prompt Compression

Token Counting and Cost Estimation

Export

Architecture

Data Flow

Storage Layout

API Reference

promptforge.models

promptforge.store.PromptStore

promptforge.engine

promptforge.quality.QualityScorer

promptforge.testing

promptforge.chain.ChainBuilder

promptforge.compression.PromptCompressor

promptforge.token_counter

promptforge.exporter.PromptExporter

Supported Models

Development

Project Structure

Contributing

Guidelines

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`promptforge.models`

`promptforge.store.PromptStore`

`promptforge.engine`

`promptforge.quality.QualityScorer`

`promptforge.testing`

`promptforge.chain.ChainBuilder`

`promptforge.compression.PromptCompressor`

`promptforge.token_counter`

`promptforge.exporter.PromptExporter`