Skip to main content

A LLM serving engine extension to reduce TTFT and increase throughput, especially under long-context scenarios.

Project description

lmcache logo

Docs PyPI PyPI - Python Version Unit Tests Code Quality Integration Tests


OpenSSF Best Practices OpenSSF Scorecard Ask DeepWiki GitHub commit activity PyPI - Downloads YouTube Channel Views


| Blog | Documentation | Join Slack | Interest Form | Roadmap

🔥 NEW: For enterprise-scale deployment of LMCache and vLLM, please check out vLLM Production Stack. LMCache is also officially supported in llm-d and KServe!

Summary

LMCache is an LLM serving engine extension to reduce TTFT and increase throughput, especially under long-context scenarios. By storing the KV caches of reusable texts across various locations, including (GPU, CPU DRAM, Local Disk), LMCache reuses the KV caches of any reused text (not necessarily prefix) in any serving engine instance. Thus, LMCache saves precious GPU cycles and reduces user response delay.

By combining LMCache with vLLM, developers achieve 3-10x delay savings and GPU cycle reduction in many LLM use cases, including multi-round QA and RAG.

performance

Features

  • 🔥 Integration with vLLM v1 with the following features:
    • High performance CPU KVCache offloading
    • Disaggregated prefill
    • P2P KVCache sharing
  • LMCache is supported in the vLLM production stack, llm-d, and KServe
  • Stable support for non-prefix KV caches
  • Storage support as follows:
  • Installation support through pip and latest vLLM

Installation

To use LMCache, simply install lmcache from your package manager, e.g. pip:

pip install lmcache

Works on Linux NVIDIA GPU platform.

More detailed installation instructions are available in the docs, particularly if you are not using the latest stable version of vllm or using another serving engine with different dependencies. Any "undefined symbol" or torch mismatch versions can be resolved in the documentation.

Getting started

The best way to get started is to checkout the Quickstart Examples in the docs.

Documentation

Check out the LMCache documentation which is available online.

We also post regularly in LMCache blogs.

Examples

Go hands-on with our examples, demonstrating how to address different use cases with LMCache.

Interested in Connecting?

Fill out the interest form, sign up for our newsletter, join LMCache slack, check out LMCache website, or drop an email, and our team will reach out to you!

Community meeting

The community meeting for LMCache is hosted bi-weekly. All are welcome to join!

Meetings are held bi-weekly on: Tuesdays at 9:00 AM PT – Add to Calendar

We keep notes from each meeting on this document for summaries of standups, discussion, and action items.

Recordings of meetings are available on the YouTube LMCache channel.

Contributing

We welcome and value all contributions and collaborations. Please check out Contributing Guide on how to contribute.

We continually update [Onboarding] Welcoming contributors with good first issues!

Citation

If you use LMCache for your research, please cite our papers:

@inproceedings{liu2024cachegen,
  title={Cachegen: Kv cache compression and streaming for fast large language model serving},
  author={Liu, Yuhan and Li, Hanchen and Cheng, Yihua and Ray, Siddhant and Huang, Yuyang and Zhang, Qizheng and Du, Kuntai and Yao, Jiayi and Lu, Shan and Ananthanarayanan, Ganesh and others},
  booktitle={Proceedings of the ACM SIGCOMM 2024 Conference},
  pages={38--56},
  year={2024}
}

@article{cheng2024large,
  title={Do Large Language Models Need a Content Delivery Network?},
  author={Cheng, Yihua and Du, Kuntai and Yao, Jiayi and Jiang, Junchen},
  journal={arXiv preprint arXiv:2409.13761},
  year={2024}
}

@inproceedings{10.1145/3689031.3696098,
  author = {Yao, Jiayi and Li, Hanchen and Liu, Yuhan and Ray, Siddhant and Cheng, Yihua and Zhang, Qizheng and Du, Kuntai and Lu, Shan and Jiang, Junchen},
  title = {CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion},
  year = {2025},
  url = {https://doi.org/10.1145/3689031.3696098},
  doi = {10.1145/3689031.3696098},
  booktitle = {Proceedings of the Twentieth European Conference on Computer Systems},
  pages = {94–109},
}

Socials

Linkedin | Twitter | Youtube

License

The LMCache codebase is licensed under Apache License 2.0. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lmcache-0.3.5.tar.gz (1.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

lmcache-0.3.5-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

lmcache-0.3.5-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

lmcache-0.3.5-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

lmcache-0.3.5-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

lmcache-0.3.5-cp39-cp39-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file lmcache-0.3.5.tar.gz.

File metadata

  • Download URL: lmcache-0.3.5.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for lmcache-0.3.5.tar.gz
Algorithm Hash digest
SHA256 7edd52701bd31908db4ca1bc4ba1066609572aec0d373d9193f5af5fe8cd9443
MD5 89f334a75c4befb9f27011d4a2a48b8d
BLAKE2b-256 bb15ab8ff798a84b3df10cddbb597087b12e28478bc7cc3ed8d0b4483b6cc30d

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmcache-0.3.5.tar.gz:

Publisher: publish.yml on LMCache/LMCache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lmcache-0.3.5-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lmcache-0.3.5-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c5c1ebef83af423064bdc45ce3c81af0b6f1815f59dc02b51c70f11699879918
MD5 e80fff82c3dd9f2b64c0a7958842d605
BLAKE2b-256 4d213f425a27755f2f1da066dbdcda5a03f5c6c8da73c68d6fe3b451cbb7c7a4

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmcache-0.3.5-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on LMCache/LMCache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lmcache-0.3.5-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lmcache-0.3.5-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 5cd08202767598b4b850564cedeb36664b639c6c4b619ccfd672c6bb6f51406e
MD5 61b8422e93215e20f2679f3b0378910c
BLAKE2b-256 11fbf70933f580583ab9653609576784f985329d91d5b515a02cf89215d77a20

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmcache-0.3.5-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on LMCache/LMCache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lmcache-0.3.5-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lmcache-0.3.5-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 21e0ea04f97f11f35a0ae7c8cea6604f97dd35e8ffef59ed9db5e816cb6a0a3e
MD5 b3b97dd7bc72d558bbaf73c7821e16e4
BLAKE2b-256 b9b246ac04b851064390e69b47ede7b051e3dc5f7fa27ae7f89bf7c058f66815

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmcache-0.3.5-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on LMCache/LMCache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lmcache-0.3.5-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lmcache-0.3.5-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 29766a68451bd1665f3247cfbde4d02843e62097c018a96950d199b23bd016f7
MD5 1cd7c3eb340561203223ce9e03731f94
BLAKE2b-256 6a51b93bff6155ef67ddc3c67e86b2428c0c727484b30c9cb53b93f8cd056e01

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmcache-0.3.5-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on LMCache/LMCache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lmcache-0.3.5-cp39-cp39-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lmcache-0.3.5-cp39-cp39-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f85fb0c0eb315a45d145f7b1bd835c02ebdd9a84b2ba53235b3f7a3c029aa296
MD5 480decdbe8d897765fd63a844fef7730
BLAKE2b-256 e3490d13f13c5891617a6df62e8bbac30364aa12d2afdb4065518746ca73ed20

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmcache-0.3.5-cp39-cp39-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on LMCache/LMCache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page