Skip to main content

A lightweight Python library for optimizing and cleaning LLM inputs

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

Prompt Refiner

PyPI version Python Versions Downloads GitHub Stars CI Status codecov License Code style: ruff Documentation Hugging Face Spaces

🧹 A lightweight Python library for optimizing and cleaning LLM inputs. Save 10-20% on API costs by removing invisible tokens, stripping HTML, and redacting PII.

If you find this useful, please star us on GitHub!


🎯 Perfect for:

RAG ApplicationsChatbotsDocument ProcessingProduction LLM AppsCost Optimization


Why use Prompt Refiner?

Stop paying for invisible tokens and dirty data.

Feature Before (Dirty Input) After (Refined)
HTML Cleaning <div><b>Hello</b> world</div> Hello world
Whitespace User input\n\n\n here User input here
PII Redaction Call me at 555-0199 Call me at [PHONE]
Deduplication Same text.\n\nSame text.\n\nDifferent. Same text.\n\nDifferent.
Token Cost 150 Tokens 85 Tokens (Saved 43%)

📦 It's this easy:

from prompt_refiner import StripHTML, NormalizeWhitespace

cleaned = (StripHTML() | NormalizeWhitespace()).run(dirty_input)

✨ Key Features

  • 🪶 Zero Dependencies - Lightweight core with no external dependencies
  • ⚡ Blazing Fast - < 0.5ms per 1k tokens overhead, negligible impact on API latency
  • 🔧 Modular Design - 5 focused modules: Cleaner, Compressor, Scrubber, Analyzer, Packer
  • 🚀 Production Ready - Battle-tested operations with comprehensive test coverage
  • 🎯 Type Safe - Full type hints for better IDE support and fewer bugs
  • 📦 Easy to Use - Modern pipe operator syntax (|), compose operations like LEGO blocks

Overview

Prompt Refiner helps you clean and optimize prompts before sending them to LLM APIs. By removing unnecessary whitespace, duplicate characters, and other inefficiencies, you can:

  • Reduce token usage and API costs
  • Improve prompt quality and consistency
  • Process inputs more efficiently

Status

This project is in early development. Features are being added iteratively.

Installation

# Basic installation (lightweight, zero dependencies)
pip install llm-prompt-refiner

# With precise token counting (optional, installs tiktoken)
pip install llm-prompt-refiner[token]

Installation Modes

  • Default (Lightweight): Zero dependencies, uses character-based token estimation
  • Precise Mode: Installs tiktoken for accurate token counting with no safety buffer

To use precise mode, pass a model parameter:

from prompt_refiner import CountTokens, ContextPacker

# Default: estimation mode (no model parameter)
counter = CountTokens()
packer = ContextPacker(max_tokens=1000)

# Opt-in: precise mode with tiktoken
counter = CountTokens(model="gpt-4")
packer = ContextPacker(max_tokens=1000, model="gpt-4")

Quick Start

from prompt_refiner import StripHTML, NormalizeWhitespace, TruncateTokens

# ✨ The Pythonic "Pipe" Syntax (Recommended)
pipeline = (
    StripHTML()
    | NormalizeWhitespace()
    | TruncateTokens(max_tokens=1000)
)

raw_input = "<div>  User input with <b>lots</b> of   spaces... </div>"
clean_prompt = pipeline.run(raw_input)
# Output: "User input with lots of spaces..."
Alternative: Fluent API

Prefer method chaining? Use the traditional fluent API:

from prompt_refiner import Refiner, StripHTML, NormalizeWhitespace, TruncateTokens

pipeline = (
    Refiner()
    .pipe(StripHTML())
    .pipe(NormalizeWhitespace())
    .pipe(TruncateTokens(max_tokens=1000))
)

clean_prompt = pipeline.run(raw_input)

💡 Why pipe operator? More concise, Pythonic, and familiar to LangChain/LangGraph users.

📊 Proven Effectiveness

We benchmarked Prompt Refiner on 30 real-world test cases (SQuAD + RAG scenarios) to measure token reduction and response quality:

Strategy Token Reduction Quality (Cosine) Judge Approval Overall Equivalent
Minimal 4.3% 0.987 86.7% 86.7%
Standard 4.8% 0.984 90.0% 86.7%
Aggressive 15.0% 0.964 80.0% 66.7%

Key Insights:

  • Aggressive strategy achieves 3x more savings (15%) vs Minimal while maintaining 96.4% quality
  • Individual RAG tests showed 17-74% token savings with aggressive strategy
  • Deduplicate (Standard) shows minimal gains on typical RAG contexts
  • TruncateTokens (Aggressive) provides the largest cost reduction for long contexts
  • Trade-off: More aggressive = more savings but slightly lower judge approval

Example: RAG with duplicates

  • Minimal (HTML + Whitespace): 17% reduction
  • Standard (+ Deduplicate): 31% reduction
  • Aggressive (+ Truncate 150 tokens): 49% reduction 🎉

Token Reduction vs Quality

💰 Cost Savings: At scale (1M tokens/month), 15% reduction saves ~$54/month on GPT-4 input tokens.

📖 See full benchmark: benchmark/custom/README.md

⚡ Performance & Latency

"What's the latency overhead?" - Negligible. Prompt Refiner adds < 0.5ms per 1k tokens of overhead.

Strategy @ 1k tokens @ 10k tokens @ 50k tokens Overhead per 1k tokens
Minimal (HTML + Whitespace) 0.05ms 0.48ms 2.39ms 0.05ms
Standard (+ Deduplicate) 0.26ms 2.47ms 12.27ms 0.25ms
Aggressive (+ Truncate) 0.26ms 2.46ms 12.38ms 0.25ms

Key Insights:

  • Minimal strategy: Only 0.05ms per 1k tokens (faster than a network packet)
  • 🎯 Standard strategy: 0.25ms per 1k tokens - adds ~2.5ms to a 10k token prompt
  • 📊 Context: Network + LLM TTFT is typically 600ms+, refining adds < 0.5% overhead
  • 🚀 Individual operations (HTML, whitespace) are < 0.5ms per 1k tokens

Real-world impact:

10k token RAG context refining: ~2.5ms overhead
Network latency: ~100ms
LLM Processing (TTFT): ~500ms+
Total overhead: < 0.5% of request time

🔬 Run yourself: python benchmark/latency/benchmark.py (no API keys needed)

🎮 Interactive Demo

Try prompt-refiner in your browser - no installation required!

Play with different strategies, see real-time token savings, and find the perfect configuration for your use case. Features:

  • 🎯 6 preset examples (e-commerce, support tickets, docs, RAG, etc.)
  • ⚡ Quick strategy presets (Minimal, Standard, Aggressive)
  • 💰 Real-time cost savings calculator
  • 🔧 All 7 operations configurable
  • 📊 Visual metrics dashboard

5 Core Modules

Prompt Refiner is organized into 5 specialized modules:

1. Cleaner - Clean Dirty Data

  • StripHTML() - Remove HTML tags, convert to Markdown
  • NormalizeWhitespace() - Collapse excessive whitespace
  • FixUnicode() - Remove zero-width spaces and problematic Unicode

2. Compressor - Reduce Size

  • TruncateTokens() - Smart truncation with sentence boundaries
    • Strategies: "head", "tail", "middle_out"
  • Deduplicate() - Remove similar content (great for RAG)

3. Scrubber - Security & Privacy

  • RedactPII() - Automatically redact emails, phones, IPs, credit cards, URLs, SSNs

4. Analyzer - Show Value

  • CountTokens() - Track token savings and optimization impact
    • Estimation mode (default): Character-based approximation (1 token ≈ 4 chars)
    • Precise mode (with tiktoken): Exact token counts using OpenAI's tokenizer

5. Packer - Context Budget Management

  • ContextPacker() - Intelligently pack items into token budgets with priority-based selection
    • Perfect for RAG applications and context window management
    • Priority constants: PRIORITY_SYSTEM, PRIORITY_USER, PRIORITY_HIGH, PRIORITY_MEDIUM, PRIORITY_LOW
    • Estimation mode: Applies 10% safety buffer to prevent context overflow
    • Precise mode: Uses 100% of token budget with accurate counting

Complete Example

from prompt_refiner import (
    # Cleaner
    StripHTML, NormalizeWhitespace, FixUnicode,
    # Compressor
    Deduplicate, TruncateTokens,
    # Scrubber
    RedactPII,
    # Analyzer
    CountTokens,
    # Packer
    ContextPacker, PRIORITY_SYSTEM, PRIORITY_USER, PRIORITY_HIGH
)

original_text = """<div>Your messy input here...</div>"""

# Create token counter to track savings
counter = CountTokens(original_text=original_text)

# Build the complete pipeline with all 4 modules
pipeline = (
    StripHTML(to_markdown=True)
    | NormalizeWhitespace()
    | FixUnicode()
    | Deduplicate(similarity_threshold=0.85)
    | TruncateTokens(max_tokens=500, strategy="head")
    | RedactPII(redact_types={"email", "phone"})
)

# Run and analyze
result = pipeline.run(original_text)
counter.process(result)

print(counter.format_stats())
# Output:
# Original: 8 tokens
# Cleaned: 5 tokens
# Saved: 3 tokens (37.5%)

Examples

Check out the examples/ folder for detailed examples organized by module:

  • cleaner/ - HTML cleaning, whitespace normalization, Unicode fixing
  • compressor/ - Smart truncation, deduplication
  • scrubber/ - PII redaction
  • analyzer/ - Token counting and cost savings
  • packer/ - Context budget management with priorities for RAG
  • all_modules_demo.py - Complete demonstration

Development

This project uses uv for dependency management and make for common tasks.

# Install dependencies
make install

# Run tests
make test

# Format code
make format

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_prompt_refiner-0.1.2.tar.gz (565.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_prompt_refiner-0.1.2-py3-none-any.whl (21.9 kB view details)

Uploaded Python 3

File details

Details for the file llm_prompt_refiner-0.1.2.tar.gz.

File metadata

  • Download URL: llm_prompt_refiner-0.1.2.tar.gz
  • Upload date:
  • Size: 565.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.13 {"installer":{"name":"uv","version":"0.9.13"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llm_prompt_refiner-0.1.2.tar.gz
Algorithm Hash digest
SHA256 9d6ac39844c20197c99b3f0fa817cc384e27c6a8482c1eaa12eb94adde6fc2f9
MD5 61d3f6a1e8976c6ae3d8df285cca0cd7
BLAKE2b-256 b205c60d10c6a139f11344f58b47e7247dcf7342e2a0655f553eb4dc2e959caf

See more details on using hashes here.

File details

Details for the file llm_prompt_refiner-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: llm_prompt_refiner-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 21.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.13 {"installer":{"name":"uv","version":"0.9.13"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llm_prompt_refiner-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1498a41a2f121128443107907630dc1c4e7e8d0aa9d570a8b83f5edf68d84b1c
MD5 70812c31d1bb02dd5f944f3d09b9e00c
BLAKE2b-256 6fe0b00155b71bde5be525ddffa11ec89776cefa5359845f1f7f628a13a0c1f4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page