Skip to main content

Intelligent context compression algorithms for RAG systems

Project description

sageRefiner

Intelligent Context Compression Algorithms for RAG Systems

sageRefiner is a standalone Python library providing state-of-the-art context compression algorithms to reduce token usage while maintaining semantic quality in RAG (Retrieval-Augmented Generation) systems.

Features

  • Multiple Compression Algorithms

    • LongRefiner: Advanced selective compression using LLM-based importance scoring
    • REFORM: Efficient attention-based compression with KV cache optimization
    • Provence: Sentence-level context pruning using DeBERTa-based scoring
  • High Compression Ratios: Achieve 2-10x compression while preserving key information

  • Flexible Configuration: Easy-to-use YAML/dict-based configuration

  • Production Ready: Battle-tested in the SAGE framework

Installation

# From PyPI (coming soon)
pip install sage-refiner

# From source
pip install git+https://github.com/intellistream/sageRefiner.git

# Development mode
git clone https://github.com/intellistream/sageRefiner.git
cd sageRefiner
pip install -e .

Quick Start

from sage_refiner import LongRefinerCompressor, RefinerConfig

# Configure the refiner
config = RefinerConfig(
    algorithm="long_refiner",
    budget=2048,  # Target token count
    base_model_path="Qwen/Qwen2.5-3B-Instruct",
)

# Initialize compressor
compressor = LongRefinerCompressor(
    base_model_path=config.base_model_path,
    max_model_len=25000,
    gpu_memory_utilization=0.5,
)

# Compress documents
query = "What are the benefits of exercise?"
documents = [
    {"contents": "Exercise improves cardiovascular health..."},
    {"contents": "Regular physical activity boosts mental wellbeing..."},
    # ... more documents
]

result = compressor.compress(
    question=query,
    document_list=documents,
    budget=2048,
)

print(f"Original tokens: {result['original_tokens']}")
print(f"Compressed tokens: {result['compressed_tokens']}")
print(f"Compression ratio: {result['compression_rate']:.2f}")
print(f"\nCompressed content:\n{result['compressed_context']}")

Algorithms

LongRefiner

Based on selective compression with LLM-guided importance scoring. Best for:

  • High-quality compression with minimal information loss
  • Scenarios where semantic coherence is critical
  • Budget-constrained LLM applications

Key Parameters:

  • budget: Target token count
  • base_model_path: HuggingFace model for compression
  • compression_ratio: Compression aggressiveness (0.0-1.0)

REFORM

Efficient attention-based compression using attention head analysis. Best for:

  • Fast compression with lower compute requirements
  • Batch processing scenarios
  • When exact wording preservation is less critical

Key Parameters:

  • max_tokens: Maximum tokens to keep
  • selected_heads: Attention heads for scoring
  • use_kv_cache: Enable KV cache optimization

Provence

Sentence-level context pruning using DeBERTa-based relevance scoring. Best for:

  • Document-level pruning in RAG pipelines
  • When you need to filter out irrelevant documents
  • Scenarios with many retrieved documents

Key Parameters:

  • threshold: Relevance threshold (0-1) for filtering
  • reorder: Whether to reorder by relevance score
  • top_k: Number of top documents to keep

Configuration

config = RefinerConfig(
    algorithm="long_refiner",  # or "reform"
    budget=2048,
    base_model_path="Qwen/Qwen2.5-3B-Instruct",
    
    # LongRefiner specific
    compression_ratio=0.5,
    device="cuda",
)

Architecture

sageRefiner is designed as a standalone library that can be integrated into any Python application:

Your Application
      ↓
sageRefiner (this library)
      ↓
[LongRefiner | Reform] → Compressed Context
      ↓
Your LLM Pipeline

Integration with SAGE

This library is part of the SAGE framework ecosystem. For seamless integration with SAGE pipelines, use the RefinerAdapter in sage-middleware:

# In SAGE environment
from sage.middleware.components.sage_refiner import RefinerAdapter

env.from_batch(...)
   .map(ChromaRetriever, retriever_config)
   .map(RefinerAdapter, refiner_config)  # Add compression step
   .map(QAPromptor, promptor_config)
   .sink(...)

Requirements

  • Python 3.10+
  • PyTorch 2.0+
  • Transformers 4.30+

Examples

See the examples/ directory for complete examples:

  • basic_compression.py: Simple compression workflow
  • algorithm_comparison.py: Compare different algorithms
  • batch_processing.py: Process multiple queries efficiently

Performance

Benchmark on common RAG datasets (RTX 3090):

Algorithm Compression Ratio Latency (avg) Quality Score
LongRefiner 3.2x 0.8s 0.92
Reform 2.5x 0.3s 0.87

Citation

If you use sageRefiner in your research, please cite:

@software{sageRefiner2025,
  title = {sageRefiner: Intelligent Context Compression for RAG},
  author = {SAGE Team},
  year = {2025},
  url = {https://github.com/intellistream/sageRefiner}
}

License

Apache License 2.0 - See LICENSE for details.

Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

isage_refiner-0.1.0.0-cp311-none-any.whl (78.5 kB view details)

Uploaded CPython 3.11

isage_refiner-0.1.0-py3-none-any.whl (141.7 kB view details)

Uploaded Python 3

File details

Details for the file isage_refiner-0.1.0.0-cp311-none-any.whl.

File metadata

File hashes

Hashes for isage_refiner-0.1.0.0-cp311-none-any.whl
Algorithm Hash digest
SHA256 520b972ef895eeaada6b161250cd1771c45a2b68b5d52554d19f106792c9f720
MD5 ee311889edaaae4d6aa5a027c5e9d3e5
BLAKE2b-256 ed4f587ed63eefaba36e4a9fdf4a10c77f4696ec29a78af01544e463498c1da7

See more details on using hashes here.

File details

Details for the file isage_refiner-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: isage_refiner-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 141.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for isage_refiner-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a15660c7467f41cd581d7197c41497100e706795abe595769734dde073e88daa
MD5 8e591b7a91e39c03efc79b349fa4ff32
BLAKE2b-256 65d918daa2f680c2e7929974a38bac9d1249306902778444e1ffe014e287bc3f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page