lmcache

A LLM serving engine extension to reduce TTFT and increase throughput, especially under long-context scenarios.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

apostaB deng451e LMCache

These details have not been verified by PyPI

Project links

homepage

Project description

lmcache logo

🔥 NEW: For enterprise-scale deployment of LMCache and vLLM, please check out vLLM Production Stack. LMCache is also officially supported in llm-d and KServe!

Summary

LMCache is an LLM serving engine extension to reduce TTFT and increase throughput, especially under long-context scenarios. By storing the KV caches of reusable texts across various locations, including (GPU, CPU DRAM, Local Disk), LMCache reuses the KV caches of any reused text (not necessarily prefix) in any serving engine instance. Thus, LMCache saves precious GPU cycles and reduces user response delay.

By combining LMCache with vLLM, developers achieve 3-10x delay savings and GPU cycle reduction in many LLM use cases, including multi-round QA and RAG.

performance

Features

🔥 Integration with vLLM v1 with the following features:
- High performance CPU KVCache offloading
- Disaggregated prefill
- P2P KVCache sharing
LMCache is supported in the vLLM production stack, llm-d, and KServe
Stable support for non-prefix KV caches
Storage support as follows:
- CPU
- Disk
- NIXL
Installation support through pip and latest vLLM

Installation

To use LMCache, simply install lmcache from your package manager, e.g. pip:

pip install lmcache

Works on Linux NVIDIA GPU platform.

More detailed installation instructions are available in the docs, particularly if you are not using the latest stable version of vllm or using another serving engine with different dependencies. Any "undefined symbol" or torch mismatch versions can be resolved in the documentation.

Getting started

The best way to get started is to checkout the Quickstart Examples in the docs.

Documentation

Check out the LMCache documentation which is available online.

We also post regularly in LMCache blogs.

Examples

Go hands-on with our examples, demonstrating how to address different use cases with LMCache.

Interested in Connecting?

Fill out the interest form, sign up for our newsletter, join LMCache slack, check out LMCache website, or drop an email, and our team will reach out to you!

Community meeting

The community meeting for LMCache is hosted bi-weekly. All are welcome to join!

Meetings are held bi-weekly on: Tuesdays at 9:00 AM PT – Add to Calendar

We keep notes from each meeting on this document for summaries of standups, discussion, and action items.

Recordings of meetings are available on the YouTube LMCache channel.

Contributing

We welcome and value all contributions and collaborations. Please check out Contributing Guide on how to contribute.

We continually update [Onboarding] Welcoming contributors with good first issues!

Citation

If you use LMCache for your research, please cite our papers:

@inproceedings{liu2024cachegen,
  title={Cachegen: Kv cache compression and streaming for fast large language model serving},
  author={Liu, Yuhan and Li, Hanchen and Cheng, Yihua and Ray, Siddhant and Huang, Yuyang and Zhang, Qizheng and Du, Kuntai and Yao, Jiayi and Lu, Shan and Ananthanarayanan, Ganesh and others},
  booktitle={Proceedings of the ACM SIGCOMM 2024 Conference},
  pages={38--56},
  year={2024}
}

@article{cheng2024large,
  title={Do Large Language Models Need a Content Delivery Network?},
  author={Cheng, Yihua and Du, Kuntai and Yao, Jiayi and Jiang, Junchen},
  journal={arXiv preprint arXiv:2409.13761},
  year={2024}
}

@inproceedings{10.1145/3689031.3696098,
  author = {Yao, Jiayi and Li, Hanchen and Liu, Yuhan and Ray, Siddhant and Cheng, Yihua and Zhang, Qizheng and Du, Kuntai and Lu, Shan and Jiang, Junchen},
  title = {CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion},
  year = {2025},
  url = {https://doi.org/10.1145/3689031.3696098},
  doi = {10.1145/3689031.3696098},
  booktitle = {Proceedings of the Twentieth European Conference on Computer Systems},
  pages = {94–109},
}

Socials

Linkedin | Twitter | Youtube

License

The LMCache codebase is licensed under Apache License 2.0. See the LICENSE file for details.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

apostaB deng451e LMCache

These details have not been verified by PyPI

Project links

homepage

Release history Release notifications | RSS feed

0.4.5

May 15, 2026

0.4.4

Apr 23, 2026

0.4.3

Apr 7, 2026

0.4.2

Mar 18, 2026

0.4.1

Mar 12, 2026

0.3.15

Mar 2, 2026

0.3.14

Feb 17, 2026

0.3.13

Jan 29, 2026

0.3.12

Jan 5, 2026

0.3.11

Dec 15, 2025

0.3.10.post2

Dec 8, 2025

0.3.10.post1

Dec 5, 2025

0.3.10

Nov 28, 2025

0.3.9.post2

Nov 11, 2025

0.3.9.post1

Nov 6, 2025

0.3.9

Oct 29, 2025

0.3.7

Sep 29, 2025

0.3.6

Sep 15, 2025

0.3.5

Aug 29, 2025

This version

0.3.4

Aug 25, 2025

0.3.3

Aug 3, 2025

0.3.2

Jul 15, 2025

0.3.1.post1

Jun 26, 2025

0.3.1

Jun 25, 2025

0.3.0

May 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lmcache-0.3.4.tar.gz (1.0 MB view details)

Uploaded Aug 25, 2025 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lmcache-0.3.4-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (3.8 MB view details)

Uploaded Aug 25, 2025 CPython 3.12manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

lmcache-0.3.4-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (3.8 MB view details)

Uploaded Aug 25, 2025 CPython 3.11manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

lmcache-0.3.4-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (3.8 MB view details)

Uploaded Aug 25, 2025 CPython 3.10manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file lmcache-0.3.4.tar.gz.

File metadata

Download URL: lmcache-0.3.4.tar.gz
Upload date: Aug 25, 2025
Size: 1.0 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for lmcache-0.3.4.tar.gz
Algorithm	Hash digest
SHA256	`bd2da4fc3d7be32cf8ed025c4208db4dd1feefefa7619542705cdc776cb9928f`
MD5	`473399efd1680c8623f180f5b04ada52`
BLAKE2b-256	`d2d9cc113f2ee1a3976d3f39bfd0d6453fcd3897626cc0d14c2d654edc9844c9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmcache-0.3.4.tar.gz:

Publisher: publish.yml on LMCache/LMCache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lmcache-0.3.4.tar.gz
- Subject digest: bd2da4fc3d7be32cf8ed025c4208db4dd1feefefa7619542705cdc776cb9928f
- Sigstore transparency entry: 430407721
- Sigstore integration time: Aug 25, 2025
Source repository:
- Permalink: LMCache/LMCache@5bdb77746e41383ae2e0b6985560a823576dbbfa
- Branch / Tag: refs/tags/v0.3.4
- Owner: https://github.com/LMCache
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@5bdb77746e41383ae2e0b6985560a823576dbbfa
- Trigger Event: release

File details

Details for the file lmcache-0.3.4-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

Download URL: lmcache-0.3.4-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Upload date: Aug 25, 2025
Size: 3.8 MB
Tags: CPython 3.12, manylinux: glibc 2.24+ x86-64, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for lmcache-0.3.4-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`899a833c02ff544d2fa50cb04f73714441b578b8e11593c2d7f14fce2efcc827`
MD5	`8df4059a283f5180db573c50899eb734`
BLAKE2b-256	`089c91f6eb57e7ab5a7493f26753339ce2108cb2c7e2213c9cedc98225960231`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmcache-0.3.4-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on LMCache/LMCache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lmcache-0.3.4-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
- Subject digest: 899a833c02ff544d2fa50cb04f73714441b578b8e11593c2d7f14fce2efcc827
- Sigstore transparency entry: 430407745
- Sigstore integration time: Aug 25, 2025
Source repository:
- Permalink: LMCache/LMCache@5bdb77746e41383ae2e0b6985560a823576dbbfa
- Branch / Tag: refs/tags/v0.3.4
- Owner: https://github.com/LMCache
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@5bdb77746e41383ae2e0b6985560a823576dbbfa
- Trigger Event: release

File details

Details for the file lmcache-0.3.4-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

Download URL: lmcache-0.3.4-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Upload date: Aug 25, 2025
Size: 3.8 MB
Tags: CPython 3.11, manylinux: glibc 2.24+ x86-64, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for lmcache-0.3.4-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`00d3fc1565396eef7a4835a98f0ba4f49c99ea6d13c551e5daee48413b537b91`
MD5	`036ecb0becd8e5405c2cea6704f3ce1b`
BLAKE2b-256	`a39957b8a133bdef11966ca3466359b43af8c4afdce2ee1393d1c3cd34e685df`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmcache-0.3.4-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on LMCache/LMCache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lmcache-0.3.4-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
- Subject digest: 00d3fc1565396eef7a4835a98f0ba4f49c99ea6d13c551e5daee48413b537b91
- Sigstore transparency entry: 430407766
- Sigstore integration time: Aug 25, 2025
Source repository:
- Permalink: LMCache/LMCache@5bdb77746e41383ae2e0b6985560a823576dbbfa
- Branch / Tag: refs/tags/v0.3.4
- Owner: https://github.com/LMCache
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@5bdb77746e41383ae2e0b6985560a823576dbbfa
- Trigger Event: release

File details

Details for the file lmcache-0.3.4-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

Download URL: lmcache-0.3.4-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Upload date: Aug 25, 2025
Size: 3.8 MB
Tags: CPython 3.10, manylinux: glibc 2.24+ x86-64, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for lmcache-0.3.4-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`61f13a0f80280821c38a4af3eaf057444afb3e623be6a93d3cfc0009ddc4ea55`
MD5	`9331910510482203c4a9087d091ccbc7`
BLAKE2b-256	`8d9243ceb08be8d57f10ee44440bdf1aa921226e696331d5a60fcf035d99178c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmcache-0.3.4-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on LMCache/LMCache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lmcache-0.3.4-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
- Subject digest: 61f13a0f80280821c38a4af3eaf057444afb3e623be6a93d3cfc0009ddc4ea55
- Sigstore transparency entry: 430407799
- Sigstore integration time: Aug 25, 2025
Source repository:
- Permalink: LMCache/LMCache@5bdb77746e41383ae2e0b6985560a823576dbbfa
- Branch / Tag: refs/tags/v0.3.4
- Owner: https://github.com/LMCache
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@5bdb77746e41383ae2e0b6985560a823576dbbfa
- Trigger Event: release

lmcache 0.3.4

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Summary

Features

Installation

Getting started

Documentation

Examples

Interested in Connecting?

Community meeting

Contributing

Citation

Socials

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance