Skip to main content

Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse

Project description

ContextPilot Logo

Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse

arXiv Python License


| Documentation | Examples | Benchmarks |

News

About

ContextPilot is a fast optimization system for Retrieval-Augmented Generation workloads:

  1. High Throughput: Boosting prefill throughput with intelligent context reuse.
  2. Accuracy Preserved: Reasoning accuracy is fully preserved and even enhanced!
  3. Strong Compatibility: Strong compatibility with existing RAG libraries (HippoRAG), KV cache optimization engine (LMCache), and Inference engines (vLLM and SGLang). Both single-node and multi-node deployment!
  4. Widely Tested: Tested with a wide range of RAG and Agentic AI applications.
ContextPilot System

Benchmark and Performance

System Performance

Benchmark Results

Tested on Qwen3-4B-Instruct-2507 with 1xH100

Accuracy on MT-RAG Benchmark

Method Qwen3-4B Llama3.1-8B Qwen3-30B-A3B
LMCache 62.56 68.46 75.12
CacheBlend 50.33 56.52 X
RadixCache 62.56 68.46 75.12
ContextPilot 64.27 68.12 75.81

ContextPilot delivers 4-13x improvements in cache hit rates and 1.5-3.5x reductions in prefill latency for large-batch RAG workloads, while maintaining or improving accuracy.

Furthermore, ContextPilot has been tested to reduce input token costs by around 36% with GPT-5.2.

See Benchmarks in the documentation for GPU vs CPU performance analysis and detailed benchmark methodology.

Getting Started

Installation

Requirements: Python >= 3.10

pip install contextpilot

Or install from source:

git clone https://github.com/SecretSettler/ContextPilot.git
cd ContextPilot
pip install -e .

Install an inference engine (SGLang recommended):

pip install --upgrade pip
pip install uv
uv pip install "sglang" --prerelease=allow

More detailed installation instructions are available in the docs, including Docker setup and FAISS configuration.

Documentation

Check out the ContextPilot documentation for comprehensive guides.

Examples

Go hands-on with our examples, demonstrating how to address different use cases with ContextPilot.

Contributing

We welcome and value all contributions! Please feel free to submit issues and pull requests.

Contact

Citation

If you use the code or data of ContextPilot, please declare the reference with the following:

@misc{jiang2025contextpilot,
      title={ContextPilot: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse}, 
      author={Yinsicheng Jiang and Yeqi Huang and Liang Cheng and Cheng Deng and Xuan Sun and Luo Mai},
      year={2025},
      eprint={2511.03475},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2511.03475}, 
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contextpilot-0.2.0.tar.gz (102.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

contextpilot-0.2.0-py3-none-any.whl (88.8 kB view details)

Uploaded Python 3

File details

Details for the file contextpilot-0.2.0.tar.gz.

File metadata

  • Download URL: contextpilot-0.2.0.tar.gz
  • Upload date:
  • Size: 102.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for contextpilot-0.2.0.tar.gz
Algorithm Hash digest
SHA256 67cbdd4ef98611bb656880c430a86a8ebbc1b0436a9df18ad3d102e345f05d14
MD5 a1d7615c760fe8d172e56fee49abbc15
BLAKE2b-256 28788a49be208285ccc744ffb218d6f11403d6119b253aae5d4c0c56b4c357ab

See more details on using hashes here.

File details

Details for the file contextpilot-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: contextpilot-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 88.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for contextpilot-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 09628e0e2e20901623f59a9d1bfbc857ba75225ee59f60a92afc15c912c13789
MD5 3cfeb18298247637f979697cd02f90f8
BLAKE2b-256 ff1aecac30cbf7d8a12c75f83a8b1d5c2ab38645d8400a05533035162aa9c11e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page