A lightweight Python library for optimizing and cleaning LLM inputs
This project has been archived.
The maintainers of this project have marked this project as archived. No new releases are expected.
Project description
Prompt Groomer
A lightweight Python library for optimizing and cleaning LLM inputs. Reduce token usage, improve prompt quality, and lower API costs.
Overview
Prompt Groomer helps you clean and optimize prompts before sending them to LLM APIs. By removing unnecessary whitespace, duplicate characters, and other inefficiencies, you can:
- Reduce token usage and API costs
- Improve prompt quality and consistency
- Process inputs more efficiently
Status
This project is in early development. Features are being added iteratively.
Installation
# Using uv (recommended)
uv pip install prompt-groomer
# Using pip
pip install prompt-groomer
Quick Start
Build custom cleaning pipelines with a fluent API:
from prompt_groomer import Groomer, StripHTML, NormalizeWhitespace, TruncateTokens
# Define a cleaning pipeline
groomer = (
Groomer()
.pipe(StripHTML())
.pipe(NormalizeWhitespace())
.pipe(TruncateTokens(max_tokens=1000, strategy="middle_out"))
)
raw_input = "<div> User input with <b>lots</b> of spaces... </div>"
clean_prompt = groomer.run(raw_input)
# Output: "User input with lots of spaces..."
4 Core Modules
Prompt Groomer is organized into 4 specialized modules:
1. Cleaner - Clean Dirty Data
StripHTML()- Remove HTML tags, convert to MarkdownNormalizeWhitespace()- Collapse excessive whitespaceFixUnicode()- Remove zero-width spaces and problematic Unicode
2. Compressor - Reduce Size
TruncateTokens()- Smart truncation with sentence boundaries- Strategies:
"head","tail","middle_out"
- Strategies:
Deduplicate()- Remove similar content (great for RAG)
3. Scrubber - Security & Privacy
RedactPII()- Automatically redact emails, phones, IPs, credit cards, URLs, SSNs
4. Analyzer - Show Value
CountTokens()- Track token savings and optimization impact
Complete Example
from prompt_groomer import (
Groomer,
# Cleaner
StripHTML, NormalizeWhitespace, FixUnicode,
# Compressor
Deduplicate, TruncateTokens,
# Scrubber
RedactPII,
# Analyzer
CountTokens
)
original_text = """Your messy input here..."""
counter = CountTokens(original_text=original_text)
groomer = (
Groomer()
# Clean
.pipe(StripHTML(to_markdown=True))
.pipe(NormalizeWhitespace())
.pipe(FixUnicode())
# Compress
.pipe(Deduplicate(similarity_threshold=0.85))
.pipe(TruncateTokens(max_tokens=500, strategy="head"))
# Secure
.pipe(RedactPII(redact_types={"email", "phone"}))
# Analyze
.pipe(counter)
)
result = groomer.run(original_text)
print(counter.format_stats()) # Shows token savings
Examples
Check out the examples/ folder for detailed examples organized by module:
cleaner/- HTML cleaning, whitespace normalization, Unicode fixingcompressor/- Smart truncation, deduplicationscrubber/- PII redactionanalyzer/- Token counting and cost savingsall_modules_demo.py- Complete demonstration
Development
This project uses uv for dependency management and make for common tasks.
# Install dependencies
make install
# Run tests
make test
# Format code
make format
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file prompt_groomer-0.1.0.tar.gz.
File metadata
- Download URL: prompt_groomer-0.1.0.tar.gz
- Upload date:
- Size: 9.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.22
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa7f131da0d03f851ad87b866520728a6a13572ff3921045667cd92ba9a63b76
|
|
| MD5 |
db3925cd441399f270837d2dd0b631c4
|
|
| BLAKE2b-256 |
0c372875f2dcf56676833ff9c3afd3dc395f17370e357bc93a7ec8e65976bc4d
|
File details
Details for the file prompt_groomer-0.1.0-py3-none-any.whl.
File metadata
- Download URL: prompt_groomer-0.1.0-py3-none-any.whl
- Upload date:
- Size: 2.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.22
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
328b025b5c4b30f3d4621e82c26418e40e216db330742b4c8baf39c273ba7ddd
|
|
| MD5 |
3a9a1eb7814773c98e59f4f1d31a34ba
|
|
| BLAKE2b-256 |
2ac8ad9cf0b101f2681b7a4fb064e67b64f7a18abe8f786afde8f3f729611227
|