Skip to main content

A lightweight Python library for optimizing and cleaning LLM inputs

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

Prompt Refiner

PyPI version Python Versions Downloads GitHub Stars CI Status codecov License Code style: ruff Documentation Hugging Face Spaces

🧹 A lightweight Python library for optimizing and cleaning LLM inputs. Save 10-20% on API costs by removing invisible tokens, stripping HTML, and redacting PII.

If you find this useful, please star us on GitHub!


🎯 Perfect for:

RAG ApplicationsChatbotsDocument ProcessingProduction LLM AppsCost Optimization


Why use Prompt Refiner?

Two core problems solved:

1. 🧹 Save 10-20% on API Costs - Clean & Optimize Prompts

Stop paying for invisible tokens and dirty data.

from prompt_refiner import StripHTML, NormalizeWhitespace

# Before: "<div>  User    input\n\n\n  here  </div>" (150 tokens)
# After: "User input here" (85 tokens) → 43% savings
cleaned = (StripHTML() | NormalizeWhitespace()).run(dirty_input)

2. 🤖 Build Smart Chatbots - Manage Context Windows

Pack system prompts, RAG docs, and chat history into your token budget. Auto-clean HTML on-the-fly.

For Chat APIs (OpenAI, Anthropic):

from prompt_refiner import MessagesPacker, ROLE_SYSTEM, ROLE_CONTEXT, ROLE_QUERY, ROLE_USER, StripHTML

packer = MessagesPacker(max_tokens=1000)
packer.add("You are helpful.", role=ROLE_SYSTEM)
packer.add("<p>RAG doc with HTML...</p>", role=ROLE_CONTEXT, refine_with=StripHTML())
packer.add("Old chat msg...", role=ROLE_USER)
packer.add("User question?", role=ROLE_QUERY)

messages = packer.pack()
# Output: List[Dict] ready for openai.chat.completions.create()
# [
#   {"role": "system", "content": "You are helpful."},
#   {"role": "user", "content": "User question?"},
#   {"role": "context", "content": "RAG doc with HTML..."}  # HTML cleaned!
# ]
# Note: "Old chat msg..." dropped to fit budget

For Completion APIs (Llama, GPT-3):

from prompt_refiner import TextPacker, TextFormat, ROLE_SYSTEM, ROLE_CONTEXT, ROLE_QUERY, ROLE_USER, ROLE_ASSISTANT, StripHTML

packer = TextPacker(max_tokens=500, text_format=TextFormat.MARKDOWN)
packer.add("You are a QA assistant.", role=ROLE_SYSTEM)
packer.add("<div>RAG doc...</div>", role=ROLE_CONTEXT, refine_with=StripHTML())
packer.add("What is X?", role=ROLE_USER)
packer.add("X is a library.", role=ROLE_ASSISTANT)
packer.add("How to install?", role=ROLE_USER)
packer.add("Use pip install.", role=ROLE_ASSISTANT)
packer.add("What is this?", role=ROLE_QUERY)

prompt = packer.pack()
# Output: str ready for completion APIs
# ### INSTRUCTIONS:
# You are a QA assistant.
#
# ### CONTEXT:
# - RAG doc...
#
# ### CONVERSATION:
# - What is X?
# - X is a library.
#
# ### INPUT:
# What is this?
#
# Note: Last 2 history messages dropped to fit budget (auto-prioritized)

✨ Key Features

Token Optimization:

  • 🧹 Clean Dirty Data - Strip HTML, normalize whitespace, fix Unicode, redact PII
  • 📉 Reduce Costs - Save 10-20% on API costs by removing unnecessary tokens
  • 📦 Pipe Syntax - Compose operations like LEGO blocks: StripHTML() | NormalizeWhitespace()

Context Management:

  • 🤖 Smart Packers - MessagesPacker for chat APIs, TextPacker for completion APIs
  • 🎯 Priority-Based - Auto-prioritizes: system > query > context > history
  • ✂️ Budget Control - Fits content within token limits, drops low-priority items
  • 🔄 JIT Cleaning - Clean RAG docs on-the-fly with refine_with=StripHTML()

Developer Experience:

  • 🪶 Zero Dependencies - Lightweight core, optional tiktoken for precise counting
  • ⚡ Blazing Fast - < 0.5ms per 1k tokens overhead
  • 🎯 Type Safe - Full type hints for better IDE support
  • 🚀 Production Ready - Battle-tested with comprehensive test coverage

Installation

# Basic installation (lightweight, zero dependencies)
pip install llm-prompt-refiner

# With precise token counting (optional, installs tiktoken)
pip install llm-prompt-refiner[token]

Installation Modes:

  • Default (Lightweight): Zero dependencies, uses character-based token estimation
  • Precise Mode: Installs tiktoken for accurate token counting with no safety buffer. Pass a model parameter to CountTokens or MessagesPacker/TextPacker to enable.

🚀 Quick Start with Preset Strategies

Don't want to manually configure operations? Use our benchmark-tested preset strategies:

from prompt_refiner.strategy import MinimalStrategy, AggressiveStrategy

# Minimal: 4.3% reduction, 98.7% quality
refiner = MinimalStrategy().create_refiner()
cleaned = refiner.run("<div>Your HTML content</div>")

# Aggressive: 15% reduction, 96.4% quality
refiner = AggressiveStrategy(max_tokens=150).create_refiner()
cleaned = refiner.run(long_context)

Choose your strategy:

Strategy Token Reduction Quality Use Case
Minimal 4.3% 98.7% Maximum quality, minimal risk
Standard 4.8% 98.4% RAG contexts with duplicates
Aggressive 15% 96.4% Cost optimization, long contexts

Usage patterns:

# Basic usage
from prompt_refiner.strategy import MinimalStrategy
refiner = MinimalStrategy().create_refiner()

# With custom parameters
from prompt_refiner.strategy import AggressiveStrategy
refiner = AggressiveStrategy(max_tokens=200).create_refiner()

# Extend with additional operations
refiner = MinimalStrategy().create_refiner()
refiner.pipe(RedactPII(redact_types={"email"}))

📖 See full examples: examples/strategy/

📊 Proven Effectiveness

We benchmarked Prompt Refiner on 30 real-world test cases (SQuAD + RAG scenarios) to measure token reduction and response quality:

Strategy Token Reduction Quality (Cosine) Judge Approval Overall Equivalent
Minimal 4.3% 0.987 86.7% 86.7%
Standard 4.8% 0.984 90.0% 86.7%
Aggressive 15.0% 0.964 80.0% 66.7%

Key Insights:

  • Aggressive strategy achieves 3x more savings (15%) vs Minimal while maintaining 96.4% quality
  • Individual RAG tests showed 17-74% token savings with aggressive strategy
  • Deduplicate (Standard) shows minimal gains on typical RAG contexts
  • TruncateTokens (Aggressive) provides the largest cost reduction for long contexts
  • Trade-off: More aggressive = more savings but slightly lower judge approval

Example: RAG with duplicates

  • Minimal (HTML + Whitespace): 17% reduction
  • Standard (+ Deduplicate): 31% reduction
  • Aggressive (+ Truncate 150 tokens): 49% reduction 🎉

Token Reduction vs Quality

💰 Cost Savings: At scale (1M tokens/month), 15% reduction saves ~$54/month on GPT-4 input tokens.

📖 See full benchmark: benchmark/custom/README.md

⚡ Performance & Latency

"What's the latency overhead?" - Negligible. Prompt Refiner adds < 0.5ms per 1k tokens of overhead.

Strategy @ 1k tokens @ 10k tokens @ 50k tokens Overhead per 1k tokens
Minimal (HTML + Whitespace) 0.05ms 0.48ms 2.39ms 0.05ms
Standard (+ Deduplicate) 0.26ms 2.47ms 12.27ms 0.25ms
Aggressive (+ Truncate) 0.26ms 2.46ms 12.38ms 0.25ms

Key Insights:

  • Minimal strategy: Only 0.05ms per 1k tokens (faster than a network packet)
  • 🎯 Standard strategy: 0.25ms per 1k tokens - adds ~2.5ms to a 10k token prompt
  • 📊 Context: Network + LLM TTFT is typically 600ms+, refining adds < 0.5% overhead
  • 🚀 Individual operations (HTML, whitespace) are < 0.5ms per 1k tokens

Real-world impact:

10k token RAG context refining: ~2.5ms overhead
Network latency: ~100ms
LLM Processing (TTFT): ~500ms+
Total overhead: < 0.5% of request time

🔬 Run yourself: python benchmark/latency/benchmark.py (no API keys needed)

🎮 Interactive Demo

Try prompt-refiner in your browser - no installation required!

Play with different strategies, see real-time token savings, and find the perfect configuration for your use case. Features:

  • 🎯 6 preset examples (e-commerce, support tickets, docs, RAG, etc.)
  • ⚡ Quick strategy presets (Minimal, Standard, Aggressive)
  • 💰 Real-time cost savings calculator
  • 🔧 All 7 operations configurable
  • 📊 Visual metrics dashboard

4 Core Modules

Prompt Refiner is organized into 4 specialized transformation modules:

1. Cleaner - Clean Dirty Data

  • StripHTML() - Remove HTML tags, convert to Markdown
  • NormalizeWhitespace() - Collapse excessive whitespace
  • FixUnicode() - Remove zero-width spaces and problematic Unicode
  • JsonCleaner() - Strip nulls/empties from JSON, minify (great for RAG APIs)

2. Compressor - Reduce Size

  • TruncateTokens() - Smart truncation with sentence boundaries
    • Strategies: "head", "tail", "middle_out"
  • Deduplicate() - Remove similar content (great for RAG)

3. Scrubber - Security & Privacy

  • RedactPII() - Automatically redact emails, phones, IPs, credit cards, URLs, SSNs

4. Packer - Context Budget Management

  • MessagesPacker - For chat APIs (OpenAI, Anthropic). Returns List[Dict]
  • TextPacker - For completion APIs (Llama Base, GPT-3). Returns str
  • Semantic roles - Use ROLE_SYSTEM, ROLE_QUERY, ROLE_CONTEXT (auto-prioritized)
  • JIT refinement - Clean documents on-the-fly with refine_with=StripHTML()
  • Priority-based selection - Automatically drops low-priority items when over budget

Measurement & Analysis

Track and measure your optimization impact:

  • CountTokens() - Calculate token savings and ROI for refiner pipelines
    • Estimation mode (default): Character-based approximation (1 token ≈ 4 chars)
    • Precise mode (with tiktoken): Exact token counts using OpenAI's tokenizer

Complete Example

from prompt_refiner import (
    # Core Modules
    StripHTML, NormalizeWhitespace, FixUnicode, JsonCleaner,  # Cleaner
    Deduplicate, TruncateTokens,  # Compressor
    RedactPII,  # Scrubber
    MessagesPacker, ROLE_SYSTEM, ROLE_QUERY, ROLE_CONTEXT,  # Packer
    # Measurement
    CountTokens,
)

# Example 1: Clean and optimize text
original_text = """<div>Your messy input here...</div>"""

counter = CountTokens(original_text=original_text)

# Build pipeline with all modules
pipeline = (
    StripHTML(to_markdown=True)
    | NormalizeWhitespace()
    | FixUnicode()
    | Deduplicate(similarity_threshold=0.85)
    | TruncateTokens(max_tokens=500, strategy="head")
    | RedactPII(redact_types={"email", "phone"})
)

result = pipeline.run(original_text)
counter.process(result)
print(counter.format_stats())

# Example 2: Pack messages for chat API with JIT refinement
packer = MessagesPacker(max_tokens=1000)
packer.add("You are helpful.", role=ROLE_SYSTEM)  # Auto: PRIORITY_SYSTEM (0)

# Clean RAG documents on-the-fly (single operation)
packer.add(
    "<div>RAG doc with HTML...</div>",
    role=ROLE_CONTEXT,  # Auto: PRIORITY_HIGH (20)
    refine_with=StripHTML()
)

# Chain multiple cleaning operations for dirty documents
packer.add(
    "<p>  Another   doc with  HTML   and   whitespace  </p>",
    role=ROLE_CONTEXT,  # Auto: PRIORITY_HIGH (20)
    refine_with=[StripHTML(), NormalizeWhitespace()]
)

packer.add("User question", role=ROLE_QUERY)  # Auto: PRIORITY_QUERY (10)

messages = packer.pack()  # Returns List[Dict]
# Use with: openai.chat.completions.create(messages=messages)

Examples

Check out the examples/ folder for detailed examples:

Quick Start:

  • strategy/ - Preset strategies (Minimal, Standard, Aggressive) for easy token optimization

Core Modules:

  • cleaner/ - HTML cleaning, JSON compression, whitespace normalization, Unicode fixing
  • compressor/ - Smart truncation, deduplication
  • scrubber/ - PII redaction
  • packer/ - Context budget management with priorities + real API integration (OpenAI)

Measurement:

  • analyzer/ - Token counting and cost savings analysis

Development

This project uses uv for dependency management and make for common tasks.

# Install dependencies
make install

# Run tests
make test

# Format code
make format

Star History

Star History Chart

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_prompt_refiner-0.1.5.tar.gz (594.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_prompt_refiner-0.1.5-py3-none-any.whl (35.0 kB view details)

Uploaded Python 3

File details

Details for the file llm_prompt_refiner-0.1.5.tar.gz.

File metadata

  • Download URL: llm_prompt_refiner-0.1.5.tar.gz
  • Upload date:
  • Size: 594.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.14 {"installer":{"name":"uv","version":"0.9.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llm_prompt_refiner-0.1.5.tar.gz
Algorithm Hash digest
SHA256 402a8e5aef1f30a16b5b9dbb5bc973414870604cd9cc9fa0397f7c787616127b
MD5 95eb4e767f0c0f240b8e85277cd2408a
BLAKE2b-256 ad0f90a1e1d4007843ddd1022eb61ccdc6d2e9fbb513f8da7a08cfb87cf76ef7

See more details on using hashes here.

File details

Details for the file llm_prompt_refiner-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: llm_prompt_refiner-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 35.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.14 {"installer":{"name":"uv","version":"0.9.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llm_prompt_refiner-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 abb4bf5008fc4bf0f91abab94d3027b11c62946c35413634013f9bd31ada6511
MD5 840c640df1f59458f411847c49c07ed0
BLAKE2b-256 6038bd0a8a3aac0bfe24e76bd50676b1f1bc9d3172ae1ab26583e22598ba7044

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page