Skip to main content

A LLM serving engine extension to reduce TTFT and increase throughput, especially under long-context scenarios.

Project description

lmcache logo

Docs PyPI PyPI - Python Version Unit Tests Code Quality Integration Tests


OpenSSF Best Practices OpenSSF Scorecard Ask DeepWiki GitHub commit activity PyPI - Downloads YouTube Channel Views


| Blog | Documentation | Join Slack | Interest Form | Roadmap

🔥 NEW: For enterprise-scale deployment of LMCache and vLLM, please check out vLLM Production Stack. LMCache is also officially supported in llm-d and KServe!

Summary

LMCache is an LLM serving engine extension to reduce TTFT and increase throughput, especially under long-context scenarios. By storing the KV caches of reusable texts across various locations, including (GPU, CPU DRAM, Local Disk), LMCache reuses the KV caches of any reused text (not necessarily prefix) in any serving engine instance. Thus, LMCache saves precious GPU cycles and reduces user response delay.

By combining LMCache with vLLM, developers achieve 3-10x delay savings and GPU cycle reduction in many LLM use cases, including multi-round QA and RAG.

performance

Features

  • 🔥 Integration with vLLM v1 with the following features:
    • High performance CPU KVCache offloading
    • Disaggregated prefill
    • P2P KVCache sharing
  • LMCache is supported in the vLLM production stack, llm-d, and KServe
  • Stable support for non-prefix KV caches
  • Storage support as follows:
  • Installation support through pip and latest vLLM

Installation

To use LMCache, simply install lmcache from your package manager, e.g. pip:

pip install lmcache

Works on Linux NVIDIA GPU platform.

More detailed installation instructions are available in the docs, particularly if you are not using the latest stable version of vllm or using another serving engine with different dependencies. Any "undefined symbol" or torch mismatch versions can be resolved in the documentation.

Getting started

The best way to get started is to checkout the Quickstart Examples in the docs.

Documentation

Check out the LMCache documentation which is available online.

We also post regularly in LMCache blogs.

Examples

Go hands-on with our examples, demonstrating how to address different use cases with LMCache.

Interested in Connecting?

Fill out the interest form, sign up for our newsletter, join LMCache slack, check out LMCache website, or drop an email, and our team will reach out to you!

Community meeting

The community meeting Zoom Link for LMCache is hosted bi-weekly. All are welcome to join!

Meetings are held bi-weekly on: Tuesdays at 9:00 AM PT – Add to Google Calendar

We keep notes from each meeting on this document for summaries of standups, discussion, and action items.

Recordings of meetings are available on the YouTube LMCache channel.

Contributing

We welcome and value all contributions and collaborations. Please check out Contributing Guide on how to contribute.

We continually update [Onboarding] Welcoming contributors with good first issues!

Citation

If you use LMCache for your research, please cite our papers:

@inproceedings{liu2024cachegen,
  title={Cachegen: Kv cache compression and streaming for fast large language model serving},
  author={Liu, Yuhan and Li, Hanchen and Cheng, Yihua and Ray, Siddhant and Huang, Yuyang and Zhang, Qizheng and Du, Kuntai and Yao, Jiayi and Lu, Shan and Ananthanarayanan, Ganesh and others},
  booktitle={Proceedings of the ACM SIGCOMM 2024 Conference},
  pages={38--56},
  year={2024}
}

@article{cheng2024large,
  title={Do Large Language Models Need a Content Delivery Network?},
  author={Cheng, Yihua and Du, Kuntai and Yao, Jiayi and Jiang, Junchen},
  journal={arXiv preprint arXiv:2409.13761},
  year={2024}
}

@inproceedings{10.1145/3689031.3696098,
  author = {Yao, Jiayi and Li, Hanchen and Liu, Yuhan and Ray, Siddhant and Cheng, Yihua and Zhang, Qizheng and Du, Kuntai and Lu, Shan and Jiang, Junchen},
  title = {CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion},
  year = {2025},
  url = {https://doi.org/10.1145/3689031.3696098},
  doi = {10.1145/3689031.3696098},
  booktitle = {Proceedings of the Twentieth European Conference on Computer Systems},
  pages = {94–109},
}

Socials

Linkedin | Twitter | Youtube

License

The LMCache codebase is licensed under Apache License 2.0. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lmcache-0.3.7.tar.gz (1.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

lmcache-0.3.7-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

lmcache-0.3.7-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

lmcache-0.3.7-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

lmcache-0.3.7-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file lmcache-0.3.7.tar.gz.

File metadata

  • Download URL: lmcache-0.3.7.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lmcache-0.3.7.tar.gz
Algorithm Hash digest
SHA256 979280d64ab1bc0f06ad0ad1e2cba24cfa6c9bd997758e31a2244da9e6a45e10
MD5 771dda031da64730861f4ba600d559c2
BLAKE2b-256 d7c8c8033a5f359727112d2e10e30ab453a9b93d1962111d945119370a2b5c10

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmcache-0.3.7.tar.gz:

Publisher: publish.yml on LMCache/LMCache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lmcache-0.3.7-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lmcache-0.3.7-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 077dbbda6d63aa2a2de78f01ffc86eb6db98f09675d70016bf08fe7fcc8f0537
MD5 fcf08efb1f569a0fa05086a9e6ea68f3
BLAKE2b-256 280e222e31c904098a1ceac87a1c37696a2c7ed6ba96331932474600666f6e0a

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmcache-0.3.7-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on LMCache/LMCache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lmcache-0.3.7-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lmcache-0.3.7-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ee2e69637a4045fd24c59d87bc2bb45bd8f21679099d0a2293075600ab4fb913
MD5 0dd8596615acc683cd65b1ab3f86fd60
BLAKE2b-256 72f96dcf38af12f1066df1935b7a81c40261910d1931a578d7719a3988e271c3

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmcache-0.3.7-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on LMCache/LMCache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lmcache-0.3.7-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lmcache-0.3.7-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 0f4ab40aaee9bb59f74583ef791dbcee2df8779904dbf140929f37eb455a5a11
MD5 8af3f2e0aa8a35be83c03e7bb0686b7c
BLAKE2b-256 fe442a65d410795e9cef37ddfc4cce4808b4e8eefa7467a6435a524313411e79

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmcache-0.3.7-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on LMCache/LMCache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lmcache-0.3.7-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for lmcache-0.3.7-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4b8a8bb4813b9b0e47398700496ad030f8b037f00654cef75822ee6935e4f7a0
MD5 900104b03e333ca678b7910b554fa721
BLAKE2b-256 2be26983499e88b8ec6ac6305f2937ce5c078045943902017d1efdc26f035355

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmcache-0.3.7-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on LMCache/LMCache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page