Skip to main content

Intelligent context compression algorithms for LLM systems

Project description

sageRefiner

Intelligent Context Compression Algorithms for RAG Systems

sageRefiner is a standalone Python library providing state-of-the-art context compression algorithms to reduce token usage while maintaining semantic quality in RAG (Retrieval-Augmented Generation) systems.

Features

  • Multiple Compression Algorithms

    • LongRefiner: Advanced selective compression using LLM-based importance scoring
    • REFORM: Efficient attention-based compression with KV cache optimization
    • Provence: Sentence-level context pruning using DeBERTa-based scoring
  • High Compression Ratios: Achieve 2-10x compression while preserving key information

  • Flexible Configuration: Easy-to-use YAML/dict-based configuration

  • Production Ready: Battle-tested in the SAGE framework

Installation

# From PyPI (coming soon)
pip install sage-refiner

# From source
pip install git+https://github.com/intellistream/sageRefiner.git

# Development mode
git clone https://github.com/intellistream/sageRefiner.git
cd sageRefiner
pip install -e .

Quick Start

from sage_refiner import LongRefinerCompressor, RefinerConfig

# Configure the refiner
config = RefinerConfig(
    algorithm="long_refiner",
    budget=2048,  # Target token count
    base_model_path="Qwen/Qwen2.5-3B-Instruct",
)

# Initialize compressor
compressor = LongRefinerCompressor(
    base_model_path=config.base_model_path,
    max_model_len=25000,
    gpu_memory_utilization=0.5,
)

# Compress documents
query = "What are the benefits of exercise?"
documents = [
    {"contents": "Exercise improves cardiovascular health..."},
    {"contents": "Regular physical activity boosts mental wellbeing..."},
    # ... more documents
]

result = compressor.compress(
    question=query,
    document_list=documents,
    budget=2048,
)

print(f"Original tokens: {result['original_tokens']}")
print(f"Compressed tokens: {result['compressed_tokens']}")
print(f"Compression ratio: {result['compression_rate']:.2f}")
print(f"\nCompressed content:\n{result['compressed_context']}")

Algorithms

LongRefiner

Based on selective compression with LLM-guided importance scoring. Best for:

  • High-quality compression with minimal information loss
  • Scenarios where semantic coherence is critical
  • Budget-constrained LLM applications

Key Parameters:

  • budget: Target token count
  • base_model_path: HuggingFace model for compression
  • compression_ratio: Compression aggressiveness (0.0-1.0)

REFORM

Efficient attention-based compression using attention head analysis. Best for:

  • Fast compression with lower compute requirements
  • Batch processing scenarios
  • When exact wording preservation is less critical

Key Parameters:

  • max_tokens: Maximum tokens to keep
  • selected_heads: Attention heads for scoring
  • use_kv_cache: Enable KV cache optimization

Provence

Sentence-level context pruning using DeBERTa-based relevance scoring. Best for:

  • Document-level pruning in RAG pipelines
  • When you need to filter out irrelevant documents
  • Scenarios with many retrieved documents

Key Parameters:

  • threshold: Relevance threshold (0-1) for filtering
  • reorder: Whether to reorder by relevance score
  • top_k: Number of top documents to keep

Configuration

config = RefinerConfig(
    algorithm="long_refiner",  # or "reform"
    budget=2048,
    base_model_path="Qwen/Qwen2.5-3B-Instruct",
    
    # LongRefiner specific
    compression_ratio=0.5,
    device="cuda",
)

Architecture

sageRefiner is designed as a standalone library that can be integrated into any Python application:

Your Application
      ↓
sageRefiner (this library)
      ↓
[LongRefiner | Reform] → Compressed Context
      ↓
Your LLM Pipeline

Integration with SAGE

This library is part of the SAGE framework ecosystem. For seamless integration with SAGE pipelines, use the RefinerAdapter in sage-middleware:

# In SAGE environment
from sage.middleware.components.sage_refiner import RefinerAdapter

env.from_batch(...)
   .map(ChromaRetriever, retriever_config)
   .map(RefinerAdapter, refiner_config)  # Add compression step
   .map(QAPromptor, promptor_config)
   .sink(...)

Requirements

  • Python 3.10+
  • PyTorch 2.0+
  • Transformers 4.30+

Examples

See the examples/ directory for complete examples:

  • basic_compression.py: Simple compression workflow
  • algorithm_comparison.py: Compare different algorithms
  • batch_processing.py: Process multiple queries efficiently

Performance

Benchmark on common RAG datasets (RTX 3090):

Algorithm Compression Ratio Latency (avg) Quality Score
LongRefiner 3.2x 0.8s 0.92
Reform 2.5x 0.3s 0.87

Citation

If you use sageRefiner in your research, please cite:

@software{sageRefiner2025,
  title = {sageRefiner: Intelligent Context Compression for RAG},
  author = {SAGE Team},
  year = {2025},
  url = {https://github.com/intellistream/sageRefiner}
}

License

Apache License 2.0 - See LICENSE for details.

Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

isage_refiner-0.1.0.2.tar.gz (87.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

isage_refiner-0.1.0.2-py3-none-any.whl (409.0 kB view details)

Uploaded Python 3

isage_refiner-0.1.0.2-py2.py3-none-any.whl (79.4 kB view details)

Uploaded Python 2Python 3

File details

Details for the file isage_refiner-0.1.0.2.tar.gz.

File metadata

  • Download URL: isage_refiner-0.1.0.2.tar.gz
  • Upload date:
  • Size: 87.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.11

File hashes

Hashes for isage_refiner-0.1.0.2.tar.gz
Algorithm Hash digest
SHA256 306a9351c382f7a42c810b92d865542d844ddb40ab9f8643d60ef6910c8ed72f
MD5 33e5d841dfb051c2a6cd366e243d06d8
BLAKE2b-256 1e1283fb1150a663562e2d985b51fc4f87f61b0eada11bcb7867bd51e3de6e57

See more details on using hashes here.

File details

Details for the file isage_refiner-0.1.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for isage_refiner-0.1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 09b5877859c7cf4505109a840e5746d3c0b116d2dd16cbee9740bff1aeb4e117
MD5 7aa1b36c18c349a9ee53780c8f86dc9b
BLAKE2b-256 00bf1df643e2ced0f8715a830db3f131d6a434fa952d117b993fad970bbff066

See more details on using hashes here.

File details

Details for the file isage_refiner-0.1.0.2-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for isage_refiner-0.1.0.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 520c695c0ee73f57da02c336c8aa10265a0774cb77c9b47b4d6f45843c97fcd3
MD5 e3d23e6723f36f9fb40c812ad5171b1c
BLAKE2b-256 251e66916c8a5c729bbe10b3902ca82958088a8b8732f729ff68fcae5fc8f395

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page