Intelligent token reduction library for LLM applications with context-aware compression
Project description
Token Reducer
Intelligent token reduction library for LLM applications with context-aware compression
Overview
Token Reducer is a Python library designed to reduce token counts in text and code inputs for Large Language Model (LLM) applications while preserving semantic meaning, logical structure, and task-relevant information. Achieve 50-70% token reduction without distorting facts or breaking code logic.
Key Features
- 🎯 Context-Aware Compression: Task-specific strategies for summarization, RAG, extraction, reasoning, and more
- 📊 Multi-Level Compression: Choose between light (5-15%), moderate (20-40%), or aggressive (50-70%) reduction
- 🔄 Multi-Pass Pipeline: Specialized passes for optimal compression (normalize → prune → compress → summarize → repack)
- 💻 Text & Code Support: Domain-specific compression for both natural language and source code
- 🔌 Tokenizer Agnostic: Works with OpenAI, Anthropic, HuggingFace, and custom tokenizers
- 🛡️ Fail-Safe Mode: Automatic quality validation with semantic similarity checking
- ⚡ High Performance: <100ms per 1000 tokens for text, <200ms for code
- 📴 Offline Operation: No cloud dependencies, works standalone
Installation
Basic Installation
pip install token-reducer
With Optional Dependencies
# For HuggingFace tokenizers
pip install token-reducer[transformers]
# For Anthropic tokenizers
pip install token-reducer[anthropic]
# For NLP features (entity extraction, sentence segmentation)
pip install token-reducer[nlp]
# For semantic similarity checking
pip install token-reducer[similarity]
# For code compression
pip install token-reducer[code]
# Install everything
pip install token-reducer[all]
Quick Start
Text Compression
from token_reducer import compress_text, TaskContext, CompressionLevel
# Compress text for summarization task
result = compress_text(
text="Your long article text here...",
task=TaskContext.SUMMARIZATION,
level=CompressionLevel.MODERATE,
tokenizer="gpt-4"
)
print(f"Original: {result.original_tokens} tokens")
print(f"Compressed: {result.compressed_tokens} tokens")
print(f"Reduction: {result.reduction_percentage}%")
print(f"\nCompressed text:\n{result.compressed_text}")
Code Compression
from token_reducer import compress_code, TaskContext, CompressionLevel
# Compress code for LLM context
result = compress_code(
code="""
def calculate_total_price(items, tax_rate=0.1):
# Calculate subtotal
subtotal = sum(item['price'] for item in items)
# Apply tax
tax = subtotal * tax_rate
# Return total
return subtotal + tax
""",
task=TaskContext.CODE_COMPLETION,
level=CompressionLevel.AGGRESSIVE,
language="python"
)
print(f"Compressed code:\n{result.compressed_code}")
Advanced Configuration
from token_reducer import CompressionConfig, compress_text
config = CompressionConfig(
task=TaskContext.RAG,
level=CompressionLevel.MODERATE,
tokenizer="claude-3",
preserve_entities=True,
preserve_numbers=True,
quality_threshold=0.90,
enable_fail_safe=True
)
result = compress_text(text, config=config)
Compression Strategies
Task Types
Token Reducer adapts compression strategies based on your use case:
- SUMMARIZATION: Preserves causal links and chronological order
- RAG: Optimizes for retrieval context (entities, facts, key phrases)
- EXTRACTION: Keeps only fields relevant to extraction target
- REASONING: Preserves premises, key details, and logical connections
- TRANSLATION: Bypasses compression entirely
- CODE_COMPLETION: Preserves function signatures and interfaces
- DEBUGGING: Maintains variable names and error-relevant context
- QUESTION_ANSWERING: Preserves facts and entities for potential questions
Compression Levels
| Level | Token Reduction | Semantic Similarity | Use Case |
|---|---|---|---|
| Light | 5-15% | >98% | Maximum safety, minimal loss |
| Moderate | 20-40% | >90% | Balanced compression and quality |
| Aggressive | 50-70% | >80% | Maximum savings, acceptable loss |
How It Works
Text Compression Pipeline
- Normalize: Fix spacing, remove HTML, standardize quotes
- Prune: Remove duplicates, redundancy, verbose explanations
- Compress: Extract entities/facts, compact phrasing, reduce adjectives
- Summarize: Apply task-specific tightening
- Repack: Shorten sentences, optimize structure
Code Compression Pipeline
- Remove Noise: Strip comments, blank lines, logging, debug prints
- Rename Identifiers: Shorten variable/function/class names
- Remove Unused: Eliminate dead code, unused imports/functions
- Optimize Expressions: Simplify boolean/arithmetic expressions
- Summarize Functions: Replace bodies with summaries (aggressive mode)
- Reformat: Minimize whitespace, compact structure
Performance
- Text: <100ms per 1000 tokens (excluding quality checks)
- Code: <200ms per 1000 tokens (excluding quality checks)
- Quality Check: <50ms for semantic similarity validation
Use Cases
Reduce LLM API Costs
# Before: 10,000 tokens × $0.03/1K = $0.30 per request
# After (60% reduction): 4,000 tokens × $0.03/1K = $0.12 per request
# Savings: 60% cost reduction
Fit More Context in Token Limits
# Compress multiple documents to fit in context window
from token_reducer import batch_compress_text
results = batch_compress_text(
texts=[doc1, doc2, doc3, doc4, doc5],
task=TaskContext.RAG,
level=CompressionLevel.MODERATE,
parallel=True
)
# Combine compressed documents within token limit
combined = "\n\n".join(r.compressed_text for r in results)
RAG Pipeline Optimization
# Compress retrieved documents before sending to LLM
retrieved_docs = vector_store.similarity_search(query, k=10)
compressed_docs = [
compress_text(
doc.page_content,
task=TaskContext.RAG,
level=CompressionLevel.MODERATE
)
for doc in retrieved_docs
]
# Use compressed docs in prompt
context = "\n\n".join(d.compressed_text for d in compressed_docs)
Documentation
- Repository: https://github.com/UsamaTufail31/token-reducer
- Issues: https://github.com/UsamaTufail31/token-reducer/issues
- Examples: https://github.com/UsamaTufail31/token-reducer/tree/main/examples
Development
Setup Development Environment
# Clone repository
git clone https://github.com/UsamaTufail31/token-reducer.git
cd token-reducer
# Install with development dependencies
pip install -e ".[dev,all]"
# Install pre-commit hooks
pre-commit install
Run Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=token_reducer --cov-report=html
# Run specific test file
pytest tests/test_text_compression.py
Code Quality
# Format code
black src/ tests/
# Sort imports
isort src/ tests/
# Lint code
ruff src/ tests/
# Type check
mypy src/
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Citation
If you use Token Reducer in your research, please cite:
@software{token_reducer,
title = {Token Reducer: Intelligent Token Reduction for LLM Applications},
author = {Tufail, Usama},
year = {2024},
url = {https://github.com/UsamaTufail31/token-reducer}
}
Acknowledgments
- Inspired by research in context compression and semantic similarity
- Built with modern NLP libraries (spaCy, sentence-transformers, tiktoken)
- Designed for the LLM application development community
Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Created by Usama Tufail | GitHub
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file token_reducer-0.2.0.tar.gz.
File metadata
- Download URL: token_reducer-0.2.0.tar.gz
- Upload date:
- Size: 47.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a5d1670aa52314961fe499324c1221bc4c8d64facfe44a9b67196bd1e0f322b
|
|
| MD5 |
317526ee5415d9af7e8d462ea82c76ee
|
|
| BLAKE2b-256 |
0a5e6132709faf02e3dc6399d8ab33d8082c9d7350750e71f7996a333f38befc
|
File details
Details for the file token_reducer-0.2.0-py3-none-any.whl.
File metadata
- Download URL: token_reducer-0.2.0-py3-none-any.whl
- Upload date:
- Size: 55.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a43c557ce8a86e0b45050eb3a5dc4b509af483a2d7be25e86efb2d11f9455a21
|
|
| MD5 |
07752a033b2bca5d08aa1d3e6f98ddb0
|
|
| BLAKE2b-256 |
7bebd7387819a6d7f26d997b6145b4e0e3a85eddcc74322671fa39cefbbad163
|