Skip to main content

Intelligent context compression algorithms for LLM systems

Project description

sageRefiner

Intelligent Context Compression Algorithms for RAG Systems

sageRefiner is a standalone Python library providing state-of-the-art context compression algorithms to reduce token usage while maintaining semantic quality in RAG (Retrieval-Augmented Generation) systems.

Features

  • Multiple Compression Algorithms

    • LongRefiner: Advanced selective compression using LLM-based importance scoring
    • REFORM: Efficient attention-based compression with KV cache optimization
    • Provence: Sentence-level context pruning using DeBERTa-based scoring
  • High Compression Ratios: Achieve 2-10x compression while preserving key information

  • Flexible Configuration: Easy-to-use YAML/dict-based configuration

  • Production Ready: Battle-tested in the SAGE framework

Installation

# From PyPI (coming soon)
pip install sage-refiner

# From source
pip install git+https://github.com/intellistream/sageRefiner.git

# Development mode
git clone https://github.com/intellistream/sageRefiner.git
cd sageRefiner
pip install -e .

Quick Start

from sage_refiner import LongRefinerCompressor, RefinerConfig

# Configure the refiner
config = RefinerConfig(
    algorithm="long_refiner",
    budget=2048,  # Target token count
    base_model_path="Qwen/Qwen2.5-3B-Instruct",
)

# Initialize compressor
compressor = LongRefinerCompressor(
    base_model_path=config.base_model_path,
    max_model_len=25000,
    gpu_memory_utilization=0.5,
)

# Compress documents
query = "What are the benefits of exercise?"
documents = [
    {"contents": "Exercise improves cardiovascular health..."},
    {"contents": "Regular physical activity boosts mental wellbeing..."},
    # ... more documents
]

result = compressor.compress(
    question=query,
    document_list=documents,
    budget=2048,
)

print(f"Original tokens: {result['original_tokens']}")
print(f"Compressed tokens: {result['compressed_tokens']}")
print(f"Compression ratio: {result['compression_rate']:.2f}")
print(f"\nCompressed content:\n{result['compressed_context']}")

Algorithms

LongRefiner

Based on selective compression with LLM-guided importance scoring. Best for:

  • High-quality compression with minimal information loss
  • Scenarios where semantic coherence is critical
  • Budget-constrained LLM applications

Key Parameters:

  • budget: Target token count
  • base_model_path: HuggingFace model for compression
  • compression_ratio: Compression aggressiveness (0.0-1.0)

REFORM

Efficient attention-based compression using attention head analysis. Best for:

  • Fast compression with lower compute requirements
  • Batch processing scenarios
  • When exact wording preservation is less critical

Key Parameters:

  • max_tokens: Maximum tokens to keep
  • selected_heads: Attention heads for scoring
  • use_kv_cache: Enable KV cache optimization

Provence

Sentence-level context pruning using DeBERTa-based relevance scoring. Best for:

  • Document-level pruning in RAG pipelines
  • When you need to filter out irrelevant documents
  • Scenarios with many retrieved documents

Key Parameters:

  • threshold: Relevance threshold (0-1) for filtering
  • reorder: Whether to reorder by relevance score
  • top_k: Number of top documents to keep

Configuration

config = RefinerConfig(
    algorithm="long_refiner",  # or "reform"
    budget=2048,
    base_model_path="Qwen/Qwen2.5-3B-Instruct",
    
    # LongRefiner specific
    compression_ratio=0.5,
    device="cuda",
)

Architecture

sageRefiner is designed as a standalone library that can be integrated into any Python application:

Your Application
      ↓
sageRefiner (this library)
      ↓
[LongRefiner | Reform] → Compressed Context
      ↓
Your LLM Pipeline

Integration with SAGE

This library is part of the SAGE framework ecosystem. For seamless integration with SAGE pipelines, use the RefinerAdapter in sage-middleware:

# In SAGE environment
from sage.middleware.components.sage_refiner import RefinerAdapter

env.from_batch(...)
   .map(ChromaRetriever, retriever_config)
   .map(RefinerAdapter, refiner_config)  # Add compression step
   .map(QAPromptor, promptor_config)
   .sink(...)

Requirements

  • Python 3.10+
  • PyTorch 2.0+
  • Transformers 4.30+

Examples

See the examples/ directory for complete examples:

  • basic_compression.py: Simple compression workflow
  • algorithm_comparison.py: Compare different algorithms
  • batch_processing.py: Process multiple queries efficiently

Performance

Benchmark on common RAG datasets (RTX 3090):

Algorithm Compression Ratio Latency (avg) Quality Score
LongRefiner 3.2x 0.8s 0.92
Reform 2.5x 0.3s 0.87

Citation

If you use sageRefiner in your research, please cite:

@software{sageRefiner2025,
  title = {sageRefiner: Intelligent Context Compression for RAG},
  author = {SAGE Team},
  year = {2025},
  url = {https://github.com/intellistream/sageRefiner}
}

License

Apache License 2.0 - See LICENSE for details.

Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

isage_refiner-0.1.0.1-py3-none-any.whl (351.2 kB view details)

Uploaded Python 3

File details

Details for the file isage_refiner-0.1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for isage_refiner-0.1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f76d0775f38ce530d905c3ef1ef3a5904548a379c6f643bb236c899ffed58cba
MD5 e08eee0dab413c8474b35e2c4120cf78
BLAKE2b-256 12f53363ee3f2a5defff1a1aa3a80fa6263835272999f784e4d15236c5c08e7f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page