Fast statistical compression for LLM prompts - 50% token reduction with 91% quality retention
Project description
Compression Prompt - Python Implementation
Fast, intelligent prompt compression for LLMs - Save 50% tokens while maintaining 91% quality
Python port of the Rust implementation. Achieves 50% token reduction with 91% quality retention using pure statistical filtering.
Quick Start
Installation
cd python
pip install -e .
Or install from source:
pip install -e ".[dev]" # With development dependencies
Basic Usage
from compression_prompt import Compressor, CompressorConfig
# Use default configuration (50% compression)
compressor = Compressor()
text = """
Your long text here...
This will be compressed using statistical filtering
to save 50% tokens while maintaining quality.
"""
result = compressor.compress(text)
print(f"Original: {result.original_tokens} tokens")
print(f"Compressed: {result.compressed_tokens} tokens")
print(f"Saved: {result.tokens_removed} tokens ({(1-result.compression_ratio)*100:.1f}%)")
print(f"\nCompressed text:\n{result.compressed}")
Advanced Configuration
from compression_prompt import (
Compressor, CompressorConfig,
StatisticalFilterConfig
)
# Custom compression ratio
config = CompressorConfig(target_ratio=0.7) # Keep 70% of tokens
filter_config = StatisticalFilterConfig(
compression_ratio=0.7,
idf_weight=0.3,
position_weight=0.2,
pos_weight=0.2,
entity_weight=0.2,
entropy_weight=0.1,
)
compressor = Compressor(config, filter_config)
result = compressor.compress(text)
Quality Metrics
from compression_prompt import QualityMetrics
original = "Your original text..."
compressed = "Your compressed text..."
metrics = QualityMetrics.calculate(original, compressed)
print(metrics.format())
Output:
Quality Metrics:
- Keyword Retention: 92.0%
- Entity Retention: 89.5%
- Vocabulary Ratio: 78.3%
- Info Density: 0.845
- Overall Score: 89.2%
Command Line Usage
# Compress file to stdout
compress input.txt
# Conservative compression (70%)
compress -r 0.7 input.txt
# Aggressive compression (30%)
compress -r 0.3 input.txt
# Show statistics
compress -s input.txt
# Save to file
compress -o output.txt input.txt
# Read from stdin
cat input.txt | compress
Features
- ✅ Zero Dependencies: Pure Python implementation, no external libraries required
- ✅ Fast: Optimized statistical filtering
- ✅ Multilingual: Supports 10+ languages (EN, ES, PT, FR, DE, IT, RU, ZH, JA, AR, HI)
- ✅ Smart Filtering: Preserves code blocks, JSON, paths, identifiers
- ✅ Contextual: Intelligent stopword handling based on context
- ✅ Customizable: Fine-tune weights and parameters for your use case
Configuration Options
CompressorConfig
CompressorConfig(
target_ratio=0.5, # Target compression ratio (0.0-1.0)
min_input_tokens=100, # Minimum tokens to attempt compression
min_input_bytes=1024 # Minimum bytes to attempt compression
)
StatisticalFilterConfig
StatisticalFilterConfig(
compression_ratio=0.5, # Keep 50% of tokens
# Feature weights (sum should be ~1.0)
idf_weight=0.3, # Inverse document frequency
position_weight=0.2, # Position in text (start/end important)
pos_weight=0.2, # Part-of-speech heuristics
entity_weight=0.2, # Named entity detection
entropy_weight=0.1, # Local vocabulary diversity
# Protection features
enable_protection_masks=True, # Protect code/JSON/paths
enable_contextual_stopwords=True, # Smart stopword filtering
preserve_negations=True, # Keep "not", "never", etc.
preserve_comparators=True, # Keep ">=", "!=", etc.
# Domain-specific
domain_terms=["YourTerm"], # Always preserve these terms
min_gap_between_critical=3 # Fill gaps between important tokens
)
Examples
Example 1: RAG System Context Compression
from compression_prompt import Compressor
# Compress retrieved context before sending to LLM
retrieved_docs = get_documents(query)
context = "\n\n".join(doc.text for doc in retrieved_docs)
compressor = Compressor()
result = compressor.compress(context)
# Save 50% tokens while maintaining quality
prompt = f"Context: {result.compressed}\n\nQuestion: {user_question}"
response = llm.generate(prompt)
Example 2: Custom Domain Terms
from compression_prompt import StatisticalFilterConfig, Compressor
# Preserve domain-specific terms
filter_config = StatisticalFilterConfig(
domain_terms=["TensorFlow", "PyTorch", "CUDA", "GPU"]
)
compressor = Compressor(filter_config=filter_config)
result = compressor.compress(technical_text)
Example 3: Aggressive Compression
from compression_prompt import CompressorConfig, StatisticalFilterConfig, Compressor
# 70% compression (keep only 30% of tokens)
config = CompressorConfig(target_ratio=0.3, min_input_tokens=50)
filter_config = StatisticalFilterConfig(compression_ratio=0.3)
compressor = Compressor(config, filter_config)
result = compressor.compress(text)
print(f"Compressed to {result.compressed_tokens} tokens (from {result.original_tokens})")
Performance Characteristics
| Compression | Token Savings | Keyword Retention | Entity Retention | Use Case |
|---|---|---|---|---|
| 50% (default) ⭐ | 50% | 92.0% | 89.5% | Best balance |
| 70% (conservative) | 30% | 99.2% | 98.4% | High precision |
| 30% (aggressive) | 70% | 72.4% | 71.5% | Maximum savings |
Development
Running Tests
pytest tests/
Code Formatting
black compression_prompt/
Type Checking
mypy compression_prompt/
Differences from Rust Version
The Python implementation maintains feature parity with the Rust version:
- ✅ Same statistical filtering algorithm
- ✅ Same configuration options
- ✅ Same quality metrics
- ✅ CLI tool with identical interface
- ⏳ Image output (optional, requires Pillow)
Performance differences:
- Rust: ~0.16ms average, 10.58 MB/s throughput
- Python: ~1-5ms average (still very fast for most use cases)
License
MIT
See Also
- Rust Implementation - Original high-performance implementation
- Main README - Project overview and benchmarks
- Architecture - Technical details
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file compression_prompt-0.1.2.tar.gz.
File metadata
- Download URL: compression_prompt-0.1.2.tar.gz
- Upload date:
- Size: 21.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7da27470258ce1d41b6db894c6b2e2241c4074e5f828ade78399192229998c8
|
|
| MD5 |
d7b2669f73cf3066bb5085a09bfcd524
|
|
| BLAKE2b-256 |
0b1000f33e2c6ed5d47ae60f60b7a6dcdb3a87459f80cbc3dbb8e72df5229db1
|
File details
Details for the file compression_prompt-0.1.2-py3-none-any.whl.
File metadata
- Download URL: compression_prompt-0.1.2-py3-none-any.whl
- Upload date:
- Size: 19.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d6c252c76d0bcef96ec4b96c0cb420dd5fc51fac3285fd8f01994fa369cb1a9
|
|
| MD5 |
1c54a5f83feac23e9fa21ea8da226a69
|
|
| BLAKE2b-256 |
14726785a00cfe6ff59b324f93f39feb9ac6c155d11d5c6f2169526cc1390b7b
|