Skip to main content

A lightweight Python library for optimizing and cleaning LLM inputs

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

Prompt Groomer

A lightweight Python library for optimizing and cleaning LLM inputs. Reduce token usage, improve prompt quality, and lower API costs.

Overview

Prompt Groomer helps you clean and optimize prompts before sending them to LLM APIs. By removing unnecessary whitespace, duplicate characters, and other inefficiencies, you can:

  • Reduce token usage and API costs
  • Improve prompt quality and consistency
  • Process inputs more efficiently

Status

This project is in early development. Features are being added iteratively.

Installation

# Using uv (recommended)
uv pip install prompt-groomer

# Using pip
pip install prompt-groomer

Quick Start

Build custom cleaning pipelines with a fluent API:

from prompt_groomer import Groomer, StripHTML, NormalizeWhitespace, TruncateTokens

# Define a cleaning pipeline
groomer = (
    Groomer()
    .pipe(StripHTML())
    .pipe(NormalizeWhitespace())
    .pipe(TruncateTokens(max_tokens=1000, strategy="middle_out"))
)

raw_input = "<div>  User input with <b>lots</b> of   spaces... </div>"
clean_prompt = groomer.run(raw_input)
# Output: "User input with lots of spaces..."

4 Core Modules

Prompt Groomer is organized into 4 specialized modules:

1. Cleaner - Clean Dirty Data

  • StripHTML() - Remove HTML tags, convert to Markdown
  • NormalizeWhitespace() - Collapse excessive whitespace
  • FixUnicode() - Remove zero-width spaces and problematic Unicode

2. Compressor - Reduce Size

  • TruncateTokens() - Smart truncation with sentence boundaries
    • Strategies: "head", "tail", "middle_out"
  • Deduplicate() - Remove similar content (great for RAG)

3. Scrubber - Security & Privacy

  • RedactPII() - Automatically redact emails, phones, IPs, credit cards, URLs, SSNs

4. Analyzer - Show Value

  • CountTokens() - Track token savings and optimization impact

Complete Example

from prompt_groomer import (
    Groomer,
    # Cleaner
    StripHTML, NormalizeWhitespace, FixUnicode,
    # Compressor
    Deduplicate, TruncateTokens,
    # Scrubber
    RedactPII,
    # Analyzer
    CountTokens
)

original_text = """Your messy input here..."""

counter = CountTokens(original_text=original_text)

groomer = (
    Groomer()
    # Clean
    .pipe(StripHTML(to_markdown=True))
    .pipe(NormalizeWhitespace())
    .pipe(FixUnicode())
    # Compress
    .pipe(Deduplicate(similarity_threshold=0.85))
    .pipe(TruncateTokens(max_tokens=500, strategy="head"))
    # Secure
    .pipe(RedactPII(redact_types={"email", "phone"}))
    # Analyze
    .pipe(counter)
)

result = groomer.run(original_text)
print(counter.format_stats())  # Shows token savings

Examples

Check out the examples/ folder for detailed examples organized by module:

  • cleaner/ - HTML cleaning, whitespace normalization, Unicode fixing
  • compressor/ - Smart truncation, deduplication
  • scrubber/ - PII redaction
  • analyzer/ - Token counting and cost savings
  • all_modules_demo.py - Complete demonstration

Development

This project uses uv for dependency management and make for common tasks.

# Install dependencies
make install

# Run tests
make test

# Format code
make format

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prompt_groomer-0.1.0.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prompt_groomer-0.1.0-py3-none-any.whl (2.6 kB view details)

Uploaded Python 3

File details

Details for the file prompt_groomer-0.1.0.tar.gz.

File metadata

  • Download URL: prompt_groomer-0.1.0.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.22

File hashes

Hashes for prompt_groomer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 aa7f131da0d03f851ad87b866520728a6a13572ff3921045667cd92ba9a63b76
MD5 db3925cd441399f270837d2dd0b631c4
BLAKE2b-256 0c372875f2dcf56676833ff9c3afd3dc395f17370e357bc93a7ec8e65976bc4d

See more details on using hashes here.

File details

Details for the file prompt_groomer-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for prompt_groomer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 328b025b5c4b30f3d4621e82c26418e40e216db330742b4c8baf39c273ba7ddd
MD5 3a9a1eb7814773c98e59f4f1d31a34ba
BLAKE2b-256 2ac8ad9cf0b101f2681b7a4fb064e67b64f7a18abe8f786afde8f3f729611227

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page