A lightweight Python library for optimizing and cleaning LLM inputs

These details have not been verified by PyPI

Project links

Project description

Prompt Groomer

🧹 A lightweight Python library for optimizing and cleaning LLM inputs. Save 10-20% on API costs by removing invisible tokens, stripping HTML, and redacting PII.

⭐ If you find this useful, please star us on GitHub! ⭐

🎯 Perfect for:

RAG Applications • Chatbots • Document Processing • Production LLM Apps • Cost Optimization

Why use Prompt Groomer?

Stop paying for invisible tokens and dirty data.

Feature	Before (Dirty Input)	After (Groomed)
HTML Cleaning	`<div><b>Hello</b> world</div>`	`Hello world`
Whitespace	`User input\n\n\n here`	`User input here`
PII Redaction	`Call me at 555-0199`	`Call me at [PHONE]`
Deduplication	`Same text.\n\nSame text.\n\nDifferent.`	`Same text.\n\nDifferent.`
Token Cost	❌ 150 Tokens	✅ 85 Tokens (Saved 43%)

📦 It's this easy:

from prompt_groomer import StripHTML, NormalizeWhitespace

cleaned = (StripHTML() | NormalizeWhitespace()).run(dirty_input)

✨ Key Features

🪶 Zero Dependencies - Lightweight core with no external dependencies
🔧 Modular Design - 4 focused modules: Cleaner, Compressor, Scrubber, Analyzer
⚡ Production Ready - Battle-tested operations with comprehensive test coverage
🎯 Type Safe - Full type hints for better IDE support and fewer bugs
📦 Easy to Use - Modern pipe operator syntax (|), compose operations like LEGO blocks

Overview

Prompt Groomer helps you clean and optimize prompts before sending them to LLM APIs. By removing unnecessary whitespace, duplicate characters, and other inefficiencies, you can:

Reduce token usage and API costs
Improve prompt quality and consistency
Process inputs more efficiently

Status

This project is in early development. Features are being added iteratively.

Installation

# Using uv (recommended)
uv pip install prompt-groomer

# Using pip
pip install prompt-groomer

Quick Start

from prompt_groomer import StripHTML, NormalizeWhitespace, TruncateTokens

# ✨ The Pythonic "Pipe" Syntax (Recommended)
pipeline = (
    StripHTML()
    | NormalizeWhitespace()
    | TruncateTokens(max_tokens=1000)
)

raw_input = "<div>  User input with <b>lots</b> of   spaces... </div>"
clean_prompt = pipeline.run(raw_input)
# Output: "User input with lots of spaces..."

Alternative: Fluent API

Prefer method chaining? Use the traditional fluent API:

from prompt_groomer import Groomer, StripHTML, NormalizeWhitespace, TruncateTokens

pipeline = (
    Groomer()
    .pipe(StripHTML())
    .pipe(NormalizeWhitespace())
    .pipe(TruncateTokens(max_tokens=1000))
)

clean_prompt = pipeline.run(raw_input)

💡 Why pipe operator? More concise, Pythonic, and familiar to LangChain/LangGraph users.

📊 Proven Effectiveness

We benchmarked Prompt Groomer on 30 real-world test cases (SQuAD + RAG scenarios) to measure token reduction and response quality:

Strategy	Token Reduction	Quality (Cosine)	Judge Approval	Overall Equivalent
Minimal	4.3%	0.987	86.7%	86.7%
Standard	4.8%	0.984	90.0%	86.7%
Aggressive	15.0%	0.964	80.0%	66.7%

Key Insights:

Aggressive strategy achieves 3x more savings (15%) vs Minimal while maintaining 96.4% quality
Individual RAG tests showed 17-74% token savings with aggressive strategy
Deduplicate (Standard) shows minimal gains on typical RAG contexts
TruncateTokens (Aggressive) provides the largest cost reduction for long contexts
Trade-off: More aggressive = more savings but slightly lower judge approval

Example: RAG with duplicates

Minimal (HTML + Whitespace): 17% reduction
Standard (+ Deduplicate): 31% reduction
Aggressive (+ Truncate 150 tokens): 49% reduction 🎉

Token Reduction vs Quality

💰 Cost Savings: At scale (1M tokens/month), 15% reduction saves ~$54/month on GPT-4 input tokens.

📖 See full benchmark: benchmark/simple/README.md

4 Core Modules

Prompt Groomer is organized into 4 specialized modules:

1. Cleaner - Clean Dirty Data

StripHTML() - Remove HTML tags, convert to Markdown
NormalizeWhitespace() - Collapse excessive whitespace
FixUnicode() - Remove zero-width spaces and problematic Unicode

2. Compressor - Reduce Size

TruncateTokens() - Smart truncation with sentence boundaries
- Strategies: "head", "tail", "middle_out"
Deduplicate() - Remove similar content (great for RAG)

3. Scrubber - Security & Privacy

RedactPII() - Automatically redact emails, phones, IPs, credit cards, URLs, SSNs

4. Analyzer - Show Value

CountTokens() - Track token savings and optimization impact

Complete Example

from prompt_groomer import (
    # Cleaner
    StripHTML, NormalizeWhitespace, FixUnicode,
    # Compressor
    Deduplicate, TruncateTokens,
    # Scrubber
    RedactPII,
    # Analyzer
    CountTokens
)

original_text = """<div>Your messy input here...</div>"""

# Create token counter to track savings
counter = CountTokens(original_text=original_text)

# Build the complete pipeline with all 4 modules
pipeline = (
    StripHTML(to_markdown=True)
    | NormalizeWhitespace()
    | FixUnicode()
    | Deduplicate(similarity_threshold=0.85)
    | TruncateTokens(max_tokens=500, strategy="head")
    | RedactPII(redact_types={"email", "phone"})
)

# Run and analyze
result = pipeline.run(original_text)
counter.process(result)

print(counter.format_stats())
# Output:
# Original: 8 tokens
# Cleaned: 5 tokens
# Saved: 3 tokens (37.5%)

Examples

Check out the examples/ folder for detailed examples organized by module:

cleaner/ - HTML cleaning, whitespace normalization, Unicode fixing
compressor/ - Smart truncation, deduplication
scrubber/ - PII redaction
analyzer/ - Token counting and cost savings
all_modules_demo.py - Complete demonstration

Development

This project uses uv for dependency management and make for common tasks.

# Install dependencies
make install

# Run tests
make test

# Format code
make format

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.0

Nov 28, 2025

0.2.2

Nov 27, 2025

0.2.1

Nov 27, 2025

This version

0.2.0

Nov 27, 2025

0.1.0

Nov 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prompt_groomer-0.2.0.tar.gz (11.9 kB view details)

Uploaded Nov 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

prompt_groomer-0.2.0-py3-none-any.whl (4.5 kB view details)

Uploaded Nov 27, 2025 Python 3

File details

Details for the file prompt_groomer-0.2.0.tar.gz.

File metadata

Download URL: prompt_groomer-0.2.0.tar.gz
Upload date: Nov 27, 2025
Size: 11.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.13 {"installer":{"name":"uv","version":"0.9.13"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for prompt_groomer-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`c6b4128e3629c0be2e7272a562f9b9c25065d99029e6a7f0e7bb585a41d44a0b`
MD5	`bb9050c92e6fd7d6f555341611525720`
BLAKE2b-256	`7ec60a65da31cf6add7f9e7725ed3e75fa4f947b18ef361c3324c7fafa96b6b6`

See more details on using hashes here.

File details

Details for the file prompt_groomer-0.2.0-py3-none-any.whl.

File metadata

Download URL: prompt_groomer-0.2.0-py3-none-any.whl
Upload date: Nov 27, 2025
Size: 4.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.13 {"installer":{"name":"uv","version":"0.9.13"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for prompt_groomer-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`270b332030d0a4c2e0cc79ca3ee6e4737b7d3588efa8c78d282ee511ed9b1551`
MD5	`eb054db48c9c630397e00bff1168dfd3`
BLAKE2b-256	`37996d2ccade743fa434e2238bbefdc51671a5bd813fcb94ff8998b0e712d270`

See more details on using hashes here.

prompt-groomer 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Prompt Groomer

🎯 Perfect for:

Why use Prompt Groomer?

📦 It's this easy:

✨ Key Features

Overview

Status

Installation

Quick Start

📊 Proven Effectiveness

4 Core Modules

1. Cleaner - Clean Dirty Data

2. Compressor - Reduce Size

3. Scrubber - Security & Privacy

4. Analyzer - Show Value

Complete Example

Examples

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes