LMCache: prefill your long contexts only once

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Environment
- GPU
License
- OSI Approved :: Apache Software License
Programming Language
- Python :: 3

Project description

💡 What is LMCache?

TL;DR - Redis for LLMs.

LMCache is a LLM serving engine extension to reduce TTFT and increase throughput, especially under long-context scenarios. By storing the KV caches of reusable texts across various locations including (GPU, CPU DRAM, Local Disk), LMCache reuse the KV caches of any reused text (not necessarily prefix) in any serving engine instance. Thus, LMCache saves precious GPU cycles and reduces response delay for users.

By combining LMCache with vLLM, LMCaches achieves 3-10x delay savings and GPU cycle reduction in many LLM use cases, including multi-round QA and RAG.

Try LMCache with pre-built vllm docker images here.

🚀 Performance snapshot

💻 Installation and Quickstart

Please refer to our detailed documentation for LMCache V1 and LMCache V0

Interested in Connecting?

Fill out the interest form and our team will reach out to you! https://forms.gle/mQfQDUXbKfp2St1z7

🛣️ News and Milestones

LMCache V1 with vLLM integration with following features is live 🔥
- High performance CPU KVCache offloading
- Disaggregated prefill
- P2P KVCache sharing
LMCache is supported in the vLLM production stack ecosystem
User and developer documentation
Stable support for non-prefix KV caches
Support installation through pip install and integrate with latest vLLM
First release of LMCache

📖 Blogs and documentations

Our latest blog posts and the documentation pages are available online

Community meeting

The community meeting for LMCache is hosted weekly. Meeting Details:

Tuesdays at 9:00 AM PT – Add to Calendar
Tuesdays at 6:30 PM PT – Add to Calendar

Meetings alternate weekly between the two times. All are welcome to join!

Contributing

We welcome and value any contributions and collaborations. Please check out CONTRIBUTING.md for how to get involved.

Citation

If you use LMCache for your research, please cite our papers:

@inproceedings{liu2024cachegen,
  title={Cachegen: Kv cache compression and streaming for fast large language model serving},
  author={Liu, Yuhan and Li, Hanchen and Cheng, Yihua and Ray, Siddhant and Huang, Yuyang and Zhang, Qizheng and Du, Kuntai and Yao, Jiayi and Lu, Shan and Ananthanarayanan, Ganesh and others},
  booktitle={Proceedings of the ACM SIGCOMM 2024 Conference},
  pages={38--56},
  year={2024}
}

@article{cheng2024large,
  title={Do Large Language Models Need a Content Delivery Network?},
  author={Cheng, Yihua and Du, Kuntai and Yao, Jiayi and Jiang, Junchen},
  journal={arXiv preprint arXiv:2409.13761},
  year={2024}
}

@article{yao2024cacheblend,
  title={CacheBlend: Fast Large Language Model Serving with Cached Knowledge Fusion},
  author={Yao, Jiayi and Li, Hanchen and Liu, Yuhan and Ray, Siddhant and Cheng, Yihua and Zhang, Qizheng and Du, Kuntai and Lu, Shan and Jiang, Junchen},
  journal={arXiv preprint arXiv:2405.16444},
  year={2024}
}

License

This project is licensed under Apache License 2.0. See the LICENSE file for details.

Project details

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Environment
- GPU
License
- OSI Approved :: Apache Software License
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

1.0.5

May 1, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lmcache_test_test-1.0.5.tar.gz (151.8 kB view details)

Uploaded May 1, 2025 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lmcache_test_test-1.0.5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.7 MB view details)

Uploaded May 1, 2025 CPython 3.13manylinux: glibc 2.17+ x86-64

lmcache_test_test-1.0.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.7 MB view details)

Uploaded May 1, 2025 CPython 3.12manylinux: glibc 2.17+ x86-64

lmcache_test_test-1.0.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.7 MB view details)

Uploaded May 1, 2025 CPython 3.11manylinux: glibc 2.17+ x86-64

lmcache_test_test-1.0.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.7 MB view details)

Uploaded May 1, 2025 CPython 3.10manylinux: glibc 2.17+ x86-64

File details

Details for the file lmcache_test_test-1.0.5.tar.gz.

File metadata

Download URL: lmcache_test_test-1.0.5.tar.gz
Upload date: May 1, 2025
Size: 151.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for lmcache_test_test-1.0.5.tar.gz
Algorithm	Hash digest
SHA256	`454450a56d8b855787d914dab3c4cfa98308c518372bc8488635e574a5dfaccd`
MD5	`55c446fe460b5f4393781937b8bc468b`
BLAKE2b-256	`473a580e4527c239d5611a822505212f7be3788f5caa35fabef69f0f5577002b`

See more details on using hashes here.

File details

Details for the file lmcache_test_test-1.0.5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: lmcache_test_test-1.0.5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: May 1, 2025
Size: 3.7 MB
Tags: CPython 3.13, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for lmcache_test_test-1.0.5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`1bdf4c181ecc973ced4ae8f52cfa07e31317763a67d7924495d23487abb2595b`
MD5	`1a23033f78fcf93e3b2a87f6f0d57c7c`
BLAKE2b-256	`7238bb221d89726959b810933d362a45c6e3fad18f2ddcf87be34f1f1d226df9`

See more details on using hashes here.

File details

Details for the file lmcache_test_test-1.0.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: lmcache_test_test-1.0.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: May 1, 2025
Size: 3.7 MB
Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for lmcache_test_test-1.0.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`8fc9024caec39be750e888f8738a497018747472886c09c1563b20ef82b144da`
MD5	`b4006105f843f276adae2bc03e1305d6`
BLAKE2b-256	`1c109b5855af68091e92d23b04e5c5c8309c4281cd37911fd8cb2e1cae5a9323`

See more details on using hashes here.

File details

Details for the file lmcache_test_test-1.0.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: lmcache_test_test-1.0.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: May 1, 2025
Size: 3.7 MB
Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for lmcache_test_test-1.0.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`1eb648c49e7ba1273d642d42cdaf743acc746fa210d3fe3c3bfc6946500bbad6`
MD5	`14dcacc32f11b48ac04ed3622d20d343`
BLAKE2b-256	`18276822a54a87e4c66f4309c8b4ff5d300081b896ab91ddf78d1ccb716896ba`

See more details on using hashes here.

File details

Details for the file lmcache_test_test-1.0.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: lmcache_test_test-1.0.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: May 1, 2025
Size: 3.7 MB
Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for lmcache_test_test-1.0.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`e666792e26851512dfb03425a00af39497e3cadf2f0796af0b62405a5c00fecd`
MD5	`0422d93454efa65fbec776d9c1205697`
BLAKE2b-256	`ceb279df921162ddbd0cfa6f52a5c63f94878f8179deef9c2b12924f01d07c17`

See more details on using hashes here.

lmcache-test-test 1.0.5

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

💡 What is LMCache?

🚀 Performance snapshot

💻 Installation and Quickstart

Interested in Connecting?

🛣️ News and Milestones

📖 Blogs and documentations

Community meeting

Contributing

Citation

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes