Skip to main content

Intelligent token reduction library for LLM applications with context-aware compression

Project description

Token Reducer

Intelligent token reduction library for LLM applications with context-aware compression

Python 3.9+ License: MIT Code style: black

Overview

Token Reducer is a Python library designed to reduce token counts in text and code inputs for Large Language Model (LLM) applications while preserving semantic meaning, logical structure, and task-relevant information. Achieve 50-70% token reduction without distorting facts or breaking code logic.

Key Features

  • 🎯 Context-Aware Compression: Task-specific strategies for summarization, RAG, extraction, reasoning, and more
  • 📊 Multi-Level Compression: Choose between light (5-15%), moderate (20-40%), or aggressive (50-70%) reduction
  • 🔄 Multi-Pass Pipeline: Specialized passes for optimal compression (normalize → prune → compress → summarize → repack)
  • 💻 Text & Code Support: Domain-specific compression for both natural language and source code
  • 🔌 Tokenizer Agnostic: Works with OpenAI, Anthropic, HuggingFace, and custom tokenizers
  • 🛡️ Fail-Safe Mode: Automatic quality validation with semantic similarity checking
  • High Performance: <100ms per 1000 tokens for text, <200ms for code
  • 📴 Offline Operation: No cloud dependencies, works standalone

Installation

Basic Installation

pip install token-reducer

With Optional Dependencies

# For HuggingFace tokenizers
pip install token-reducer[transformers]

# For Anthropic tokenizers
pip install token-reducer[anthropic]

# For NLP features (entity extraction, sentence segmentation)
pip install token-reducer[nlp]

# For semantic similarity checking
pip install token-reducer[similarity]

# For code compression
pip install token-reducer[code]

# Install everything
pip install token-reducer[all]

Quick Start

Text Compression

from token_reducer import compress_text, TaskContext, CompressionLevel

# Compress text for summarization task
result = compress_text(
    text="Your long article text here...",
    task=TaskContext.SUMMARIZATION,
    level=CompressionLevel.MODERATE,
    tokenizer="gpt-4"
)

print(f"Original: {result.original_tokens} tokens")
print(f"Compressed: {result.compressed_tokens} tokens")
print(f"Reduction: {result.reduction_percentage}%")
print(f"\nCompressed text:\n{result.compressed_text}")

Code Compression

from token_reducer import compress_code, TaskContext, CompressionLevel

# Compress code for LLM context
result = compress_code(
    code="""
    def calculate_total_price(items, tax_rate=0.1):
        # Calculate subtotal
        subtotal = sum(item['price'] for item in items)
        # Apply tax
        tax = subtotal * tax_rate
        # Return total
        return subtotal + tax
    """,
    task=TaskContext.CODE_COMPLETION,
    level=CompressionLevel.AGGRESSIVE,
    language="python"
)

print(f"Compressed code:\n{result.compressed_code}")

Advanced Configuration

from token_reducer import CompressionConfig, compress_text

config = CompressionConfig(
    task=TaskContext.RAG,
    level=CompressionLevel.MODERATE,
    tokenizer="claude-3",
    preserve_entities=True,
    preserve_numbers=True,
    quality_threshold=0.90,
    enable_fail_safe=True
)

result = compress_text(text, config=config)

Compression Strategies

Task Types

Token Reducer adapts compression strategies based on your use case:

  • SUMMARIZATION: Preserves causal links and chronological order
  • RAG: Optimizes for retrieval context (entities, facts, key phrases)
  • EXTRACTION: Keeps only fields relevant to extraction target
  • REASONING: Preserves premises, key details, and logical connections
  • TRANSLATION: Bypasses compression entirely
  • CODE_COMPLETION: Preserves function signatures and interfaces
  • DEBUGGING: Maintains variable names and error-relevant context
  • QUESTION_ANSWERING: Preserves facts and entities for potential questions

Compression Levels

Level Token Reduction Semantic Similarity Use Case
Light 5-15% >98% Maximum safety, minimal loss
Moderate 20-40% >90% Balanced compression and quality
Aggressive 50-70% >80% Maximum savings, acceptable loss

How It Works

Text Compression Pipeline

  1. Normalize: Fix spacing, remove HTML, standardize quotes
  2. Prune: Remove duplicates, redundancy, verbose explanations
  3. Compress: Extract entities/facts, compact phrasing, reduce adjectives
  4. Summarize: Apply task-specific tightening
  5. Repack: Shorten sentences, optimize structure

Code Compression Pipeline

  1. Remove Noise: Strip comments, blank lines, logging, debug prints
  2. Rename Identifiers: Shorten variable/function/class names
  3. Remove Unused: Eliminate dead code, unused imports/functions
  4. Optimize Expressions: Simplify boolean/arithmetic expressions
  5. Summarize Functions: Replace bodies with summaries (aggressive mode)
  6. Reformat: Minimize whitespace, compact structure

Performance

  • Text: <100ms per 1000 tokens (excluding quality checks)
  • Code: <200ms per 1000 tokens (excluding quality checks)
  • Quality Check: <50ms for semantic similarity validation

Use Cases

Reduce LLM API Costs

# Before: 10,000 tokens × $0.03/1K = $0.30 per request
# After (60% reduction): 4,000 tokens × $0.03/1K = $0.12 per request
# Savings: 60% cost reduction

Fit More Context in Token Limits

# Compress multiple documents to fit in context window
from token_reducer import batch_compress_text

results = batch_compress_text(
    texts=[doc1, doc2, doc3, doc4, doc5],
    task=TaskContext.RAG,
    level=CompressionLevel.MODERATE,
    parallel=True
)

# Combine compressed documents within token limit
combined = "\n\n".join(r.compressed_text for r in results)

RAG Pipeline Optimization

# Compress retrieved documents before sending to LLM
retrieved_docs = vector_store.similarity_search(query, k=10)

compressed_docs = [
    compress_text(
        doc.page_content,
        task=TaskContext.RAG,
        level=CompressionLevel.MODERATE
    )
    for doc in retrieved_docs
]

# Use compressed docs in prompt
context = "\n\n".join(d.compressed_text for d in compressed_docs)

Documentation

Development

Setup Development Environment

# Clone repository
git clone https://github.com/UsamaTufail31/token-reducer.git
cd token-reducer

# Install with development dependencies
pip install -e ".[dev,all]"

# Install pre-commit hooks
pre-commit install

Run Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=token_reducer --cov-report=html

# Run specific test file
pytest tests/test_text_compression.py

Code Quality

# Format code
black src/ tests/

# Sort imports
isort src/ tests/

# Lint code
ruff src/ tests/

# Type check
mypy src/

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use Token Reducer in your research, please cite:

@software{token_reducer,
  title = {Token Reducer: Intelligent Token Reduction for LLM Applications},
  author = {Tufail, Usama},
  year = {2024},
  url = {https://github.com/UsamaTufail31/token-reducer}
}

Acknowledgments

  • Inspired by research in context compression and semantic similarity
  • Built with modern NLP libraries (spaCy, sentence-transformers, tiktoken)
  • Designed for the LLM application development community

Support


Created by Usama Tufail | GitHub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

token_reducer-0.2.0.tar.gz (47.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

token_reducer-0.2.0-py3-none-any.whl (55.5 kB view details)

Uploaded Python 3

File details

Details for the file token_reducer-0.2.0.tar.gz.

File metadata

  • Download URL: token_reducer-0.2.0.tar.gz
  • Upload date:
  • Size: 47.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for token_reducer-0.2.0.tar.gz
Algorithm Hash digest
SHA256 3a5d1670aa52314961fe499324c1221bc4c8d64facfe44a9b67196bd1e0f322b
MD5 317526ee5415d9af7e8d462ea82c76ee
BLAKE2b-256 0a5e6132709faf02e3dc6399d8ab33d8082c9d7350750e71f7996a333f38befc

See more details on using hashes here.

File details

Details for the file token_reducer-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: token_reducer-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 55.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for token_reducer-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a43c557ce8a86e0b45050eb3a5dc4b509af483a2d7be25e86efb2d11f9455a21
MD5 07752a033b2bca5d08aa1d3e6f98ddb0
BLAKE2b-256 7bebd7387819a6d7f26d997b6145b4e0e3a85eddcc74322671fa39cefbbad163

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page