Skip to main content

LMCache: prefill your long contexts only once

Project description

lmcache logo

| Blog | Documentation | Join Slack | Interest Form | Official Email |

💡 What is LMCache?

TL;DR - Redis for LLMs.

LMCache is a LLM serving engine extension to reduce TTFT and increase throughput, especially under long-context scenarios. By storing the KV caches of reusable texts across various locations including (GPU, CPU DRAM, Local Disk), LMCache reuse the KV caches of any reused text (not necessarily prefix) in any serving engine instance. Thus, LMCache saves precious GPU cycles and reduces response delay for users.

By combining LMCache with vLLM, LMCaches achieves 3-10x delay savings and GPU cycle reduction in many LLM use cases, including multi-round QA and RAG.

Try LMCache with pre-built vllm docker images here.

🚀 Performance snapshot

image

💻 Installation and Quickstart

Please refer to our detailed documentation for LMCache V1 and LMCache V0

Interested in Connecting?

Fill out the interest form and our team will reach out to you! https://forms.gle/mQfQDUXbKfp2St1z7

🛣️ News and Milestones

  • LMCache V1 with vLLM integration with following features is live 🔥
    • High performance CPU KVCache offloading
    • Disaggregated prefill
    • P2P KVCache sharing
  • LMCache is supported in the vLLM production stack ecosystem
  • User and developer documentation
  • Stable support for non-prefix KV caches
  • Support installation through pip install and integrate with latest vLLM
  • First release of LMCache

📖 Blogs and documentations

Our latest blog posts and the documentation pages are available online

Community meeting

The community meeting for LMCache is hosted weekly. Meeting Details:

Meetings alternate weekly between the two times. All are welcome to join!

Contributing

We welcome and value any contributions and collaborations. Please check out CONTRIBUTING.md for how to get involved.

Citation

If you use LMCache for your research, please cite our papers:

@inproceedings{liu2024cachegen,
  title={Cachegen: Kv cache compression and streaming for fast large language model serving},
  author={Liu, Yuhan and Li, Hanchen and Cheng, Yihua and Ray, Siddhant and Huang, Yuyang and Zhang, Qizheng and Du, Kuntai and Yao, Jiayi and Lu, Shan and Ananthanarayanan, Ganesh and others},
  booktitle={Proceedings of the ACM SIGCOMM 2024 Conference},
  pages={38--56},
  year={2024}
}

@article{cheng2024large,
  title={Do Large Language Models Need a Content Delivery Network?},
  author={Cheng, Yihua and Du, Kuntai and Yao, Jiayi and Jiang, Junchen},
  journal={arXiv preprint arXiv:2409.13761},
  year={2024}
}

@article{yao2024cacheblend,
  title={CacheBlend: Fast Large Language Model Serving with Cached Knowledge Fusion},
  author={Yao, Jiayi and Li, Hanchen and Liu, Yuhan and Ray, Siddhant and Cheng, Yihua and Zhang, Qizheng and Du, Kuntai and Lu, Shan and Jiang, Junchen},
  journal={arXiv preprint arXiv:2405.16444},
  year={2024}
}

License

This project is licensed under Apache License 2.0. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lmcache_test_test-1.0.5.tar.gz (151.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

lmcache_test_test-1.0.5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

lmcache_test_test-1.0.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

lmcache_test_test-1.0.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

lmcache_test_test-1.0.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.7 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

File details

Details for the file lmcache_test_test-1.0.5.tar.gz.

File metadata

  • Download URL: lmcache_test_test-1.0.5.tar.gz
  • Upload date:
  • Size: 151.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for lmcache_test_test-1.0.5.tar.gz
Algorithm Hash digest
SHA256 454450a56d8b855787d914dab3c4cfa98308c518372bc8488635e574a5dfaccd
MD5 55c446fe460b5f4393781937b8bc468b
BLAKE2b-256 473a580e4527c239d5611a822505212f7be3788f5caa35fabef69f0f5577002b

See more details on using hashes here.

File details

Details for the file lmcache_test_test-1.0.5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for lmcache_test_test-1.0.5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1bdf4c181ecc973ced4ae8f52cfa07e31317763a67d7924495d23487abb2595b
MD5 1a23033f78fcf93e3b2a87f6f0d57c7c
BLAKE2b-256 7238bb221d89726959b810933d362a45c6e3fad18f2ddcf87be34f1f1d226df9

See more details on using hashes here.

File details

Details for the file lmcache_test_test-1.0.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for lmcache_test_test-1.0.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8fc9024caec39be750e888f8738a497018747472886c09c1563b20ef82b144da
MD5 b4006105f843f276adae2bc03e1305d6
BLAKE2b-256 1c109b5855af68091e92d23b04e5c5c8309c4281cd37911fd8cb2e1cae5a9323

See more details on using hashes here.

File details

Details for the file lmcache_test_test-1.0.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for lmcache_test_test-1.0.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1eb648c49e7ba1273d642d42cdaf743acc746fa210d3fe3c3bfc6946500bbad6
MD5 14dcacc32f11b48ac04ed3622d20d343
BLAKE2b-256 18276822a54a87e4c66f4309c8b4ff5d300081b896ab91ddf78d1ccb716896ba

See more details on using hashes here.

File details

Details for the file lmcache_test_test-1.0.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for lmcache_test_test-1.0.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e666792e26851512dfb03425a00af39497e3cadf2f0796af0b62405a5c00fecd
MD5 0422d93454efa65fbec776d9c1205697
BLAKE2b-256 ceb279df921162ddbd0cfa6f52a5c63f94878f8179deef9c2b12924f01d07c17

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page