Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse
Project description
| Documentation | Examples | Benchmarks |
News
- [2026/01] Rebranded to ContextPilot!
- [2025/12] Code is released!
- [2025/11] Paper published: ContextPilot: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse
About
ContextPilot is a fast optimization system for Retrieval-Augmented Generation workloads:
- High Throughput: Boosting prefill throughput with intelligent context reuse.
- Accuracy Preserved: Reasoning accuracy is fully preserved and even enhanced!
- Strong Compatibility: Strong compatibility with existing RAG libraries (HippoRAG), KV cache optimization engine (LMCache), and Inference engines (vLLM and SGLang). Both single-node and multi-node deployment!
- Widely Tested: Tested with a wide range of RAG and Agentic AI applications.
Benchmark and Performance
System Performance
Tested on Qwen3-4B-Instruct-2507 with 1xH100
Accuracy on MT-RAG Benchmark
| Method | Qwen3-4B | Llama3.1-8B | Qwen3-30B-A3B |
|---|---|---|---|
| LMCache | 62.56 | 68.46 | 75.12 |
| CacheBlend | 50.33 | 56.52 | X |
| RadixCache | 62.56 | 68.46 | 75.12 |
| ContextPilot | 64.27 | 68.12 | 75.81 |
ContextPilot delivers 4-13x improvements in cache hit rates and 1.5-3.5x reductions in prefill latency for large-batch RAG workloads, while maintaining or improving accuracy.
Furthermore, ContextPilot has been tested to reduce input token costs by around 36% with GPT-5.2.
See Benchmarks in the documentation for GPU vs CPU performance analysis and detailed benchmark methodology.
Getting Started
Installation
Requirements: Python >= 3.10
pip install contextpilot
Or install from source:
git clone https://github.com/SecretSettler/ContextPilot.git
cd ContextPilot
pip install -e .
Install an inference engine (SGLang recommended):
pip install --upgrade pip
pip install uv
uv pip install "sglang" --prerelease=allow
More detailed installation instructions are available in the docs, including Docker setup and FAISS configuration.
Documentation
Check out the ContextPilot documentation for comprehensive guides.
Examples
Go hands-on with our examples, demonstrating how to address different use cases with ContextPilot.
Contributing
We welcome and value all contributions! Please feel free to submit issues and pull requests.
Contact
Citation
If you use the code or data of ContextPilot, please declare the reference with the following:
@misc{jiang2025contextpilot,
title={ContextPilot: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse},
author={Yinsicheng Jiang and Yeqi Huang and Liang Cheng and Cheng Deng and Xuan Sun and Luo Mai},
year={2025},
eprint={2511.03475},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2511.03475},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file contextpilot-0.2.0.tar.gz.
File metadata
- Download URL: contextpilot-0.2.0.tar.gz
- Upload date:
- Size: 102.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
67cbdd4ef98611bb656880c430a86a8ebbc1b0436a9df18ad3d102e345f05d14
|
|
| MD5 |
a1d7615c760fe8d172e56fee49abbc15
|
|
| BLAKE2b-256 |
28788a49be208285ccc744ffb218d6f11403d6119b253aae5d4c0c56b4c357ab
|
File details
Details for the file contextpilot-0.2.0-py3-none-any.whl.
File metadata
- Download URL: contextpilot-0.2.0-py3-none-any.whl
- Upload date:
- Size: 88.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
09628e0e2e20901623f59a9d1bfbc857ba75225ee59f60a92afc15c912c13789
|
|
| MD5 |
3cfeb18298247637f979697cd02f90f8
|
|
| BLAKE2b-256 |
ff1aecac30cbf7d8a12c75f83a8b1d5c2ab38645d8400a05533035162aa9c11e
|