Skip to main content

Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse

Project description

ContextPilot Logo

ContextPilot: Efficient Long Context Inference with Context Reuse

arXiv Python License


| Documentation | Examples | Benchmarks |

News

  • [2026/01] ContextPilot has been accepted to MLSys 2026 🎉! See you in Bellevue, WA, USA.
  • [2026/01] Code is released!

About

ContextPilot is a fast optimization system on context engineering layer for agentic workloads:

  1. High Throughput: Boosting prefill throughput with intelligent context reuse.
  2. Accuracy Preserved: Reasoning accuracy is fully preserved and even enhanced!
  3. Strong Compatibility: Strong compatibility with existing popular RAG libraries (PageIndex), Agentic memory layer (Mem0), KV cache optimization engine (LMCache), and Inference engines (vLLM and SGLang). Both single-node and multi-node deployment!
  4. Widely Tested: Tested with a wide range of RAG and Agentic AI applications.

Target Workloads

  1. Trending Topic QA with Retrieval — Search and generation for breaking news and hot topics beyond model knowledge
  2. Closed-Domain Long-Context QA — Retrieval-augmented QA over specialized corpora (novels, financial reports, legal documents)
  3. Multi-Turn Conversations with Long-Term Memory — Persistent context across sessions (e.g. Mem0)

Benchmark and Performance

System Performance

Benchmark Results

ContextPilot on DeepSeek-R1 maintains accuracy compared to SGLang, achieving 64.68% vs 64.15% F1 on MultihopRAG and 41.08% vs 40.20% F1 on NarrativeQA.

Accuracy on MT-RAG Benchmark

Method Qwen3-4B Llama3.1-8B Qwen3-30B-A3B
LMCache 62.56 68.46 75.12
CacheBlend 50.33 56.52 X
RadixCache 62.56 68.46 75.12
ContextPilot 64.27 68.12 75.81

ContextPilot delivers 4-13x improvements in cache hit rates and 1.5-3.5x reductions in prefill latency for large-batch RAG workloads, while maintaining or improving accuracy.

Furthermore, ContextPilot has been tested to reduce input token costs by around 36% with GPT-5.2.

See Benchmarks in the documentation for GPU vs CPU performance analysis and detailed benchmark methodology.

Getting Started

Installation

Requirements: Python >= 3.10

pip install contextpilot

Or install from source:

git clone https://github.com/SecretSettler/ContextPilot.git
cd ContextPilot
pip install -e .

Install an inference engine (SGLang recommended):

pip install --upgrade pip
pip install uv
uv pip install "sglang" --prerelease=allow

More detailed installation instructions are available in the docs, including Docker setup and FAISS configuration.

PageIndex Integration (NEW!)

ContextPilot now supports PageIndex, a reasoning-based, vectorless RAG system. PageIndex uses LLM reasoning over hierarchical document tree structures instead of vector similarity search:

from contextpilot.retriever import PageIndexRetriever
from contextpilot import RAGPipeline, RetrieverConfig, OptimizerConfig

# Option 1: Use PageIndexRetriever directly
retriever = PageIndexRetriever(model="gpt-4o")
retriever.load_tree_structures(["document_structure.json"])
results = retriever.search_queries(query_data=[{"question": "What is the revenue?"}])

# Option 2: Use unified RAGPipeline
pipeline = RAGPipeline(
    retriever=RetrieverConfig(
        retriever_type="pageindex",
        pageindex_model="gpt-4o",
        pageindex_tree_paths=["document_structure.json"],
        top_k=5
    ),
    optimizer=OptimizerConfig(enabled=True),
    use_contextpilot=True
)
pipeline.setup()

See examples/pageindex_example.py for detailed usage.

Documentation

Check out the ContextPilot documentation for comprehensive guides.

Examples

Go hands-on with our examples, demonstrating how to address different use cases with ContextPilot.

Contributing

We welcome and value all contributions! Please feel free to submit issues and pull requests.

Citation

If you use the code or data of ContextPilot, please declare the reference with the following:

@misc{jiang2025contextpilot,
      title={ContextPilot: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse}, 
      author={Yinsicheng Jiang and Yeqi Huang and Liang Cheng and Cheng Deng and Xuan Sun and Luo Mai},
      year={2025},
      eprint={2511.03475},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2511.03475}, 
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contextpilot-0.3.0.tar.gz (119.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

contextpilot-0.3.0-py3-none-any.whl (101.2 kB view details)

Uploaded Python 3

File details

Details for the file contextpilot-0.3.0.tar.gz.

File metadata

  • Download URL: contextpilot-0.3.0.tar.gz
  • Upload date:
  • Size: 119.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for contextpilot-0.3.0.tar.gz
Algorithm Hash digest
SHA256 d92d777f03e58d6733a60d8ca79ef8d74a4526a98a95e1be5da294eca919e361
MD5 579a5210bf686ed1b73e65e44e94f01a
BLAKE2b-256 a6a650ce9a9cc6eaba0ff2968c8ce9ee80b8c5f5a75d4540c31bf1f8e5100738

See more details on using hashes here.

File details

Details for the file contextpilot-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: contextpilot-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 101.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for contextpilot-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1b0b0f82932a7d9ba21885921440555049839c89b7ce7ea862536135ff3d81ba
MD5 80a18fd1f0175aab6a5435525ca556d5
BLAKE2b-256 0db7a322162d57981fd1489257fc5fc788f6db1333b857f9921d9d984f2f713f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page