Intelligent context compression algorithms for RAG systems

These details have not been verified by PyPI

Project links

Project description

sageRefiner

Intelligent Context Compression Algorithms for RAG Systems

sageRefiner is a standalone Python library providing state-of-the-art context compression algorithms to reduce token usage while maintaining semantic quality in RAG (Retrieval-Augmented Generation) systems.

Features

Multiple Compression Algorithms
- LongRefiner: Advanced selective compression using LLM-based importance scoring
- REFORM: Efficient attention-based compression with KV cache optimization
- Provence: Sentence-level context pruning using DeBERTa-based scoring
High Compression Ratios: Achieve 2-10x compression while preserving key information
Flexible Configuration: Easy-to-use YAML/dict-based configuration
Production Ready: Battle-tested in the SAGE framework

Installation

# From PyPI (coming soon)
pip install sage-refiner

# From source
pip install git+https://github.com/intellistream/sageRefiner.git

# Development mode
git clone https://github.com/intellistream/sageRefiner.git
cd sageRefiner
pip install -e .

Quick Start

from sage_refiner import LongRefinerCompressor, RefinerConfig

# Configure the refiner
config = RefinerConfig(
    algorithm="long_refiner",
    budget=2048,  # Target token count
    base_model_path="Qwen/Qwen2.5-3B-Instruct",
)

# Initialize compressor
compressor = LongRefinerCompressor(
    base_model_path=config.base_model_path,
    max_model_len=25000,
    gpu_memory_utilization=0.5,
)

# Compress documents
query = "What are the benefits of exercise?"
documents = [
    {"contents": "Exercise improves cardiovascular health..."},
    {"contents": "Regular physical activity boosts mental wellbeing..."},
    # ... more documents
]

result = compressor.compress(
    question=query,
    document_list=documents,
    budget=2048,
)

print(f"Original tokens: {result['original_tokens']}")
print(f"Compressed tokens: {result['compressed_tokens']}")
print(f"Compression ratio: {result['compression_rate']:.2f}")
print(f"\nCompressed content:\n{result['compressed_context']}")

Algorithms

LongRefiner

Based on selective compression with LLM-guided importance scoring. Best for:

High-quality compression with minimal information loss
Scenarios where semantic coherence is critical
Budget-constrained LLM applications

Key Parameters:

budget: Target token count
base_model_path: HuggingFace model for compression
compression_ratio: Compression aggressiveness (0.0-1.0)

REFORM

Efficient attention-based compression using attention head analysis. Best for:

Fast compression with lower compute requirements
Batch processing scenarios
When exact wording preservation is less critical

Key Parameters:

max_tokens: Maximum tokens to keep
selected_heads: Attention heads for scoring
use_kv_cache: Enable KV cache optimization

Provence

Sentence-level context pruning using DeBERTa-based relevance scoring. Best for:

Document-level pruning in RAG pipelines
When you need to filter out irrelevant documents
Scenarios with many retrieved documents

Key Parameters:

threshold: Relevance threshold (0-1) for filtering
reorder: Whether to reorder by relevance score
top_k: Number of top documents to keep

Configuration

config = RefinerConfig(
    algorithm="long_refiner",  # or "reform"
    budget=2048,
    base_model_path="Qwen/Qwen2.5-3B-Instruct",
    
    # LongRefiner specific
    compression_ratio=0.5,
    device="cuda",
)

Architecture

sageRefiner is designed as a standalone library that can be integrated into any Python application:

Your Application
      ↓
sageRefiner (this library)
      ↓
[LongRefiner | Reform] → Compressed Context
      ↓
Your LLM Pipeline

Integration with SAGE

This library is part of the SAGE framework ecosystem. For seamless integration with SAGE pipelines, use the RefinerAdapter in sage-middleware:

# In SAGE environment
from sage.middleware.components.sage_refiner import RefinerAdapter

env.from_batch(...)
   .map(ChromaRetriever, retriever_config)
   .map(RefinerAdapter, refiner_config)  # Add compression step
   .map(QAPromptor, promptor_config)
   .sink(...)

Requirements

Python 3.10+
PyTorch 2.0+
Transformers 4.30+

Examples

See the examples/ directory for complete examples:

basic_compression.py: Simple compression workflow
algorithm_comparison.py: Compare different algorithms
batch_processing.py: Process multiple queries efficiently

Performance

Benchmark on common RAG datasets (RTX 3090):

Algorithm	Compression Ratio	Latency (avg)	Quality Score
LongRefiner	3.2x	0.8s	0.92
Reform	2.5x	0.3s	0.87

Citation

If you use sageRefiner in your research, please cite:

@software{sageRefiner2025,
  title = {sageRefiner: Intelligent Context Compression for RAG},
  author = {SAGE Team},
  year = {2025},
  url = {https://github.com/intellistream/sageRefiner}
}

License

Apache License 2.0 - See LICENSE for details.

Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.0.10

Mar 1, 2026

0.1.0.2

Jan 8, 2026

0.1.0.1

Jan 8, 2026

This version

0.1.0

Jan 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

isage_refiner-0.1.0.0-cp311-none-any.whl (78.5 kB view details)

Uploaded Jan 13, 2026 CPython 3.11

isage_refiner-0.1.0-py3-none-any.whl (141.7 kB view details)

Uploaded Jan 3, 2026 Python 3

File details

Details for the file isage_refiner-0.1.0.0-cp311-none-any.whl.

File metadata

Download URL: isage_refiner-0.1.0.0-cp311-none-any.whl
Upload date: Jan 13, 2026
Size: 78.5 kB
Tags: CPython 3.11
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for isage_refiner-0.1.0.0-cp311-none-any.whl
Algorithm	Hash digest
SHA256	`520b972ef895eeaada6b161250cd1771c45a2b68b5d52554d19f106792c9f720`
MD5	`ee311889edaaae4d6aa5a027c5e9d3e5`
BLAKE2b-256	`ed4f587ed63eefaba36e4a9fdf4a10c77f4696ec29a78af01544e463498c1da7`

See more details on using hashes here.

File details

Details for the file isage_refiner-0.1.0-py3-none-any.whl.

File metadata

Download URL: isage_refiner-0.1.0-py3-none-any.whl
Upload date: Jan 3, 2026
Size: 141.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for isage_refiner-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a15660c7467f41cd581d7197c41497100e706795abe595769734dde073e88daa`
MD5	`8e591b7a91e39c03efc79b349fa4ff32`
BLAKE2b-256	`65d918daa2f680c2e7929974a38bac9d1249306902778444e1ffe014e287bc3f`

See more details on using hashes here.

isage-refiner 0.1.0

Navigation

Verified details

Owner

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

sageRefiner

Features

Installation

Quick Start

Algorithms

LongRefiner

REFORM

Provence

Configuration

Architecture

Integration with SAGE

Requirements

Examples

Performance

Citation

License

Contributing

Links

Project details

Verified details

Owner

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes