Skip to main content

A lightweight Python library for optimizing and cleaning LLM inputs

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

Prompt Refiner

PyPI version Python Versions Downloads GitHub Stars CI Status codecov License Code style: ruff Documentation Hugging Face Spaces

🚀 Lightweight Python library for AI Agents, RAG apps, and chatbots with smart context management and automatic token optimization. Save 10-20% on API costs while fitting RAG docs, chat history, and prompts into your token budget.


🎯 Perfect for:

RAG ApplicationsAI AgentsChatbotsDocument ProcessingCost Optimization


Why use Prompt Refiner?

Build AI agents, RAG applications, and chatbots with automatic token optimization and smart context management. Here's a complete example (see examples/quickstart.py for full code):

from prompt_refiner import MessagesPacker, SchemaCompressor, ResponseCompressor, StripHTML
from openai import OpenAI, pydantic_function_tool
from pydantic import BaseModel, Field

# 1. Pack messages with token budget and track savings
packer = MessagesPacker(
    max_tokens=1000,
    model="gpt-4o-mini",
    track_savings=True,
    system="You are a helpful AI assistant that helps users find books.",
    context=(
        ["<div><h1>Installation Guide</h1>...</div>"],
        [StripHTML()]
    ),
    query="Search for books about Python programming."
)
messages = packer.pack()

# Get token savings
savings = packer.get_token_savings()
print(f"Saved {savings['saved_tokens']} tokens ({savings['savings_rate']:.1%})")

# 2. Compress tool schema (139 → 131 tokens, 5.8% saved)
class SearchBooksInput(BaseModel):
    query: str = Field(description="Search query to find books")

tool_schema = pydantic_function_tool(SearchBooksInput, name="search_books")
compressed_schema = SchemaCompressor().process(tool_schema)

# 3. Execute tool call with OpenAI
client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    tools=[compressed_schema]
)
tool_call = response.choices[0].message.tool_calls[0]
tool_response = search_books(**json.loads(tool_call.function.arguments))

# 4. Compress tool response (19251 → 12813 tokens, 33.4% saved)
compressed_response = ResponseCompressor().process(tool_response)

💡 Run python examples/quickstart.py to see the complete workflow with real OpenAI API verification.

Key benefits:

  • Tool schema compression - Save 10-15% tokens on AI agent function definitions
  • Tool response compression - Save 30-70% tokens on agent tool outputs
  • Compose operations with | - Chain multiple cleaners into a pipeline
  • Save 10-20% tokens - Remove HTML, whitespace, duplicates, and redact PII automatically
  • Stay within budget - MessagesPacker fits everything into 1000 tokens using priority-based selection
  • JIT cleaning - Clean content on-the-fly with refine_with parameter
  • Production ready - Output goes directly to OpenAI without extra steps

✨ Key Features

Module Description Components
Cleaner Remove noise and save tokens StripHTML(), NormalizeWhitespace(), FixUnicode(), JsonCleaner()
Compressor Reduce size aggressively TruncateTokens(), Deduplicate()
Scrubber Protect sensitive data RedactPII()
Tools Optimize AI agent function calling (tool schemas & responses) SchemaCompressor(), ResponseCompressor()
Packer Fit content within token budgets MessagesPacker (chat APIs), TextPacker (completion APIs)
Strategy Benchmark-tested presets for quick setup MinimalStrategy, StandardStrategy, AggressiveStrategy

Installation

# Basic installation (lightweight, zero dependencies)
pip install llm-prompt-refiner

# With precise token counting (optional, installs tiktoken)
pip install llm-prompt-refiner[token]

Examples

Check out the examples/ folder for detailed examples:

  • strategy/ - Preset strategies (Minimal, Standard, Aggressive) with benchmark results
  • cleaner/ - HTML cleaning, JSON compression, whitespace normalization, Unicode fixing
  • compressor/ - Smart truncation, deduplication
  • scrubber/ - PII redaction (emails, phones, credit cards, etc.)
  • tools/ - Tool/API output cleaning for agent systems
  • packer/ - Context budget management with OpenAI integration
  • analyzer/ - Token counting and cost savings tracking

📖 Full documentation: examples/README.md

📊 Proven Effectiveness

We benchmarked Prompt Refiner on 30 real-world test cases (SQuAD + RAG scenarios) to measure token reduction and response quality:

Strategy Token Reduction Quality (Cosine) Judge Approval Overall Equivalent
Minimal 4.3% 0.987 86.7% 86.7%
Standard 4.8% 0.984 90.0% 86.7%
Aggressive 15.0% 0.964 80.0% 66.7%

Key Insights:

  • Aggressive strategy achieves 3x more savings (15%) vs Minimal while maintaining 96.4% quality
  • Individual RAG tests showed 17-74% token savings with aggressive strategy
  • Deduplicate (Standard) shows minimal gains on typical RAG contexts
  • TruncateTokens (Aggressive) provides the largest cost reduction for long contexts
  • Trade-off: More aggressive = more savings but slightly lower judge approval

Example: RAG with duplicates

  • Minimal (HTML + Whitespace): 17% reduction
  • Standard (+ Deduplicate): 31% reduction
  • Aggressive (+ Truncate 150 tokens): 49% reduction 🎉

Token Reduction vs Quality

💰 Cost Savings: At scale (1M tokens/month), 15% reduction saves ~$54/month on GPT-4 input tokens.

📖 See full benchmark: benchmark/custom/README.md

⚡ Performance & Latency

"What's the latency overhead?" - Negligible. Prompt Refiner adds < 0.5ms per 1k tokens of overhead.

Strategy @ 1k tokens @ 10k tokens @ 50k tokens Overhead per 1k tokens
Minimal (HTML + Whitespace) 0.05ms 0.48ms 2.39ms 0.05ms
Standard (+ Deduplicate) 0.26ms 2.47ms 12.27ms 0.25ms
Aggressive (+ Truncate) 0.26ms 2.46ms 12.38ms 0.25ms

Key Insights:

  • Minimal strategy: Only 0.05ms per 1k tokens (faster than a network packet)
  • 🎯 Standard strategy: 0.25ms per 1k tokens - adds ~2.5ms to a 10k token prompt
  • 📊 Context: Network + LLM TTFT is typically 600ms+, refining adds < 0.5% overhead
  • 🚀 Individual operations (HTML, whitespace) are < 0.5ms per 1k tokens

Real-world impact:

10k token RAG context refining: ~2.5ms overhead
Network latency: ~100ms
LLM Processing (TTFT): ~500ms+
Total overhead: < 0.5% of request time

🔬 Run yourself: python benchmark/latency/benchmark.py (no API keys needed)

🎮 Interactive Demo

Try prompt-refiner in your browser - no installation required!

Play with different strategies, see real-time token savings, and find the perfect configuration for your use case. Features:

  • 🎯 6 preset examples (e-commerce, support tickets, docs, RAG, etc.)
  • ⚡ Quick strategy presets (Minimal, Standard, Aggressive)
  • 💰 Real-time cost savings calculator
  • 🔧 All 7 operations configurable
  • 📊 Visual metrics dashboard

Star History

Star History Chart

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_prompt_refiner-0.1.8.tar.gz (146.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_prompt_refiner-0.1.8-py3-none-any.whl (41.2 kB view details)

Uploaded Python 3

File details

Details for the file llm_prompt_refiner-0.1.8.tar.gz.

File metadata

  • Download URL: llm_prompt_refiner-0.1.8.tar.gz
  • Upload date:
  • Size: 146.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llm_prompt_refiner-0.1.8.tar.gz
Algorithm Hash digest
SHA256 0d16b67365964687becd888dd54b781d1d5781fa0c8641e3dc2ec4369a9f33ca
MD5 442c1d694803d3e20d5e990d738bb1b5
BLAKE2b-256 2d7ccae82fb1dc237975c53677d34a1c35bd0f8d2b8838433a775de9b8b9c2f3

See more details on using hashes here.

File details

Details for the file llm_prompt_refiner-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: llm_prompt_refiner-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 41.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llm_prompt_refiner-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 809f4c124a9ae38e585b503f6036202021a070d4933af0cf8211096f5cd275b8
MD5 637d20476ed2141638b70eeffd7ac65e
BLAKE2b-256 eb5ad3c407c0d0514639ab66d09955f8bb0bb00b12acc9a55d2eda7b638af114

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page