A lightweight Python library for optimizing and cleaning LLM inputs
This project has been archived.
The maintainers of this project have marked this project as archived. No new releases are expected.
Project description
Prompt Refiner
🚀 Lightweight Python library for building production LLM applications with smart context management and automatic token optimization. Save 10-20% on API costs while fitting RAG docs, chat history, and prompts into your token budget.
🎯 Perfect for:
RAG Applications • Chatbots • Document Processing • Production LLM Apps • Cost Optimization
Why use Prompt Refiner?
Build production RAG applications with automatic token optimization and smart context management. Here's a complete example showing a chatbot that saves tokens and fits within budget:
from prompt_refiner import MessagesPacker, SchemaCompressor, ROLE_SYSTEM, ROLE_QUERY, ROLE_CONTEXT, ROLE_USER, ROLE_ASSISTANT, StripHTML, NormalizeWhitespace, Deduplicate, RedactPII
from openai import OpenAI
# Set up MessagesPacker with token budget
packer = MessagesPacker(max_tokens=1000)
packer.add("You are a helpful AI assistant.", role=ROLE_SYSTEM)
# Add user query with composed cleaning pipeline (pipe operator |)
packer.add(
"How do I install your library? How do I install your library? My email is john@example.com",
role=ROLE_QUERY,
refine_with=StripHTML() | NormalizeWhitespace() | Deduplicate(0.8) | RedactPII({"email"})
)
# Add RAG documents with JIT cleaning
packer.add("<div><h2>Docs</h2><p>Our AI helps developers...</p></div>", role=ROLE_CONTEXT, refine_with=[StripHTML(), NormalizeWhitespace()])
# Add conversation history (dropped first if over budget)
packer.add("What can you do?", role=ROLE_USER)
packer.add("I help with documentation.", role=ROLE_ASSISTANT)
# Pack and send to OpenAI
messages = packer.pack() # Saved ~45 tokens (18%)!
# Compress tool schemas (save 40-50% tokens on function definitions)
tool = {
"type": "function",
"function": {
"name": "search_docs",
"title": "Documentation Search",
"description": "Search our documentation with examples...",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query with `keywords`"} # Markdown removed
}
}
}
}
compressor = SchemaCompressor()
compressed_tool = compressor.process(tool) # Saves around 10-15% tokens
response = OpenAI().chat.completions.create(
model="gpt-4",
messages=messages,
tools=[compressed_tool]
)
print(response.choices[0].message.content)
This example demonstrates:
- Compress tool schemas - Save 40-50% tokens on function calling definitions
- Compose operations with
|- Chain multiple cleaners into a pipeline - Save 10-20% tokens - Remove HTML, whitespace, duplicates, and redact PII automatically
- Stay within budget - MessagesPacker fits everything into 1000 tokens using priority-based selection
- JIT cleaning - Clean content on-the-fly with
refine_withparameter - Production ready - Output goes directly to OpenAI without extra steps
✨ Key Features
| Module | Description | Components |
|---|---|---|
| Cleaner | Remove noise and save tokens | StripHTML(), NormalizeWhitespace(), FixUnicode(), JsonCleaner() |
| Compressor | Reduce size aggressively | TruncateTokens(), Deduplicate() |
| Scrubber | Protect sensitive data | RedactPII() |
| Tools | Optimize LLM tool/API outputs and schemas | ToolOutputCleaner(), SchemaCompressor() |
| Packer | Fit content within token budgets | MessagesPacker (chat APIs), TextPacker (completion APIs) |
| Strategy | Benchmark-tested presets for quick setup | MinimalStrategy, StandardStrategy, AggressiveStrategy |
Installation
# Basic installation (lightweight, zero dependencies)
pip install llm-prompt-refiner
# With precise token counting (optional, installs tiktoken)
pip install llm-prompt-refiner[token]
Examples
Check out the examples/ folder for detailed examples:
strategy/- Preset strategies (Minimal, Standard, Aggressive) with benchmark resultscleaner/- HTML cleaning, JSON compression, whitespace normalization, Unicode fixingcompressor/- Smart truncation, deduplicationscrubber/- PII redaction (emails, phones, credit cards, etc.)tools/- Tool/API output cleaning for agent systemspacker/- Context budget management with OpenAI integrationanalyzer/- Token counting and cost savings trackingcustom_operation.py- Build your own custom operations
📖 Full documentation: examples/README.md
📊 Proven Effectiveness
We benchmarked Prompt Refiner on 30 real-world test cases (SQuAD + RAG scenarios) to measure token reduction and response quality:
| Strategy | Token Reduction | Quality (Cosine) | Judge Approval | Overall Equivalent |
|---|---|---|---|---|
| Minimal | 4.3% | 0.987 | 86.7% | 86.7% |
| Standard | 4.8% | 0.984 | 90.0% | 86.7% |
| Aggressive | 15.0% | 0.964 | 80.0% | 66.7% |
Key Insights:
- Aggressive strategy achieves 3x more savings (15%) vs Minimal while maintaining 96.4% quality
- Individual RAG tests showed 17-74% token savings with aggressive strategy
- Deduplicate (Standard) shows minimal gains on typical RAG contexts
- TruncateTokens (Aggressive) provides the largest cost reduction for long contexts
- Trade-off: More aggressive = more savings but slightly lower judge approval
Example: RAG with duplicates
- Minimal (HTML + Whitespace): 17% reduction
- Standard (+ Deduplicate): 31% reduction
- Aggressive (+ Truncate 150 tokens): 49% reduction 🎉
💰 Cost Savings: At scale (1M tokens/month), 15% reduction saves ~$54/month on GPT-4 input tokens.
📖 See full benchmark: benchmark/custom/README.md
⚡ Performance & Latency
"What's the latency overhead?" - Negligible. Prompt Refiner adds < 0.5ms per 1k tokens of overhead.
| Strategy | @ 1k tokens | @ 10k tokens | @ 50k tokens | Overhead per 1k tokens |
|---|---|---|---|---|
| Minimal (HTML + Whitespace) | 0.05ms | 0.48ms | 2.39ms | 0.05ms |
| Standard (+ Deduplicate) | 0.26ms | 2.47ms | 12.27ms | 0.25ms |
| Aggressive (+ Truncate) | 0.26ms | 2.46ms | 12.38ms | 0.25ms |
Key Insights:
- ⚡ Minimal strategy: Only 0.05ms per 1k tokens (faster than a network packet)
- 🎯 Standard strategy: 0.25ms per 1k tokens - adds ~2.5ms to a 10k token prompt
- 📊 Context: Network + LLM TTFT is typically 600ms+, refining adds < 0.5% overhead
- 🚀 Individual operations (HTML, whitespace) are < 0.5ms per 1k tokens
Real-world impact:
10k token RAG context refining: ~2.5ms overhead
Network latency: ~100ms
LLM Processing (TTFT): ~500ms+
Total overhead: < 0.5% of request time
🔬 Run yourself:
python benchmark/latency/benchmark.py(no API keys needed)
🎮 Interactive Demo
Try prompt-refiner in your browser - no installation required!
Play with different strategies, see real-time token savings, and find the perfect configuration for your use case. Features:
- 🎯 6 preset examples (e-commerce, support tickets, docs, RAG, etc.)
- ⚡ Quick strategy presets (Minimal, Standard, Aggressive)
- 💰 Real-time cost savings calculator
- 🔧 All 7 operations configurable
- 📊 Visual metrics dashboard
Star History
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_prompt_refiner-0.1.6.tar.gz.
File metadata
- Download URL: llm_prompt_refiner-0.1.6.tar.gz
- Upload date:
- Size: 602.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.16 {"installer":{"name":"uv","version":"0.9.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
17725604a4757d254f2802416f5d83e32fdcd309f1ce6e75daf8cf47ad371e63
|
|
| MD5 |
9e9ab315958b93f921cff54820381ad6
|
|
| BLAKE2b-256 |
c116e8c0753461694305c68e1f63ace6c721567d6e6310cdc398d300dc44f929
|
File details
Details for the file llm_prompt_refiner-0.1.6-py3-none-any.whl.
File metadata
- Download URL: llm_prompt_refiner-0.1.6-py3-none-any.whl
- Upload date:
- Size: 36.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.16 {"installer":{"name":"uv","version":"0.9.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a4f43582347d581e21091189817c5707bc3830b1210f208f525afc22e3989204
|
|
| MD5 |
18e72f9671e939191a5c573fb6cf9b59
|
|
| BLAKE2b-256 |
e8d67833e84b9fa612f200e411d8a4b58525e9510d1d4b09349c597b5067f35b
|