Skip to main content

Cache-DiT: A PyTorch-native Inference Engine with Cache, Parallelism and Quantization for Diffusion Transformers.

Project description

⚡️🎉A PyTorch-native Inference Engine with Cache,
Parallelism, Quantization for Diffusion Transformers
Featured|HelloGitHub

🤗Why Cache-DiT❓❓Cache-DiT is built on top of the 🤗Diffusers library and now supports nearly ALL DiTs from Diffusers. It provides hybrid cache acceleration (DBCache, TaylorSeer, SCM, etc.) and comprehensive parallelism optimizations, including Context Parallelism, Tensor Parallelism, hybrid 2D or 3D parallelism, and dedicated extra parallelism support for Text Encoder, VAE, and ControlNet.

Cache-DiT is compatible with compilation, CPU Offloading, and quantization, fully integrates with SGLang Diffusion, vLLM-Omni, TensorRT-LLM, ComfyUI, and runs natively on NVIDIA GPUs, Ascend NPUs and AMD GPUs. Cache-DiT is fast, easy to use, and flexible for various DiTs (online docs at 📘cache-dit.io).

⚡️9x speedup by Cache-DiT with Cache, Context Parallelism and Compilation

🚀Quick Start: Cache, Parallelism and Quantization

First, you can install the cache-dit from PyPI or install from source:

uv pip install -U cache-dit # PyPI, stable release.
uv pip install git+https://github.com/vipshop/cache-dit.git # latest

Then, try to accelerate your DiTs with just ♥️one line♥️ of code ~

>>> import cache_dit
>>> from diffusers import DiffusionPipeline
>>> pipe = DiffusionPipeline.from_pretrained(...).to("cuda")
>>> cache_dit.enable_cache(pipe) # Cache Acceleration with One-line code.
>>> from cache_dit import DBCacheConfig, ParallelismConfig
>>> cache_dit.enable_cache( # Or, Hybrid Cache Acceleration + Parallelism.
...   pipe, cache_config=DBCacheConfig(), # w/ default
...   parallelism_config=ParallelismConfig(ulysses_size=2))
>>> from cache_dit import DBCacheConfig, ParallelismConfig, QuantizeConfig
>>> cache_dit.enable_cache( # Or, Hybrid Cache + Parallelism + Quantization.
...   pipe, cache_config=DBCacheConfig(), # w/ default
...   parallelism_config=ParallelismConfig(ulysses_size=2),
...   quantize_config=QuantizeConfig(quant_type=...))
>>> output = pipe(...) # Then, just call the pipe as normal.

🚀Quick Start: SVDQuant (W4A4) PTQ/DQ workflow

First, install Cache-DiT with SVDQuant support (Experimental):

uv pip install -U cache-dit-cu13 # CUDA 13.0+, PyTorch 2.11+.
# Or, just build Cache-DiT with SVDQuant support from source.
CACHE_DIT_BUILD_SVDQUANT=1 uv pip install -e ".[quantization]"

Then, try to quantize your model with just ♥️a few lines♥️ of codes ~

>>> from cache_dit import QuantizeConfig
>>> pipe = DiffusionPipeline.from_pretrained(...).to("cuda")
>>> # Apply quantization with `cache_dit.quantize(...)` API.
>>> pipe.transformer = cache_dit.quantize(
...   pipe.transformer, quant_config=QuantizeConfig(
...   quant_type="svdq_{int4|nvfp4}_r128_dq", # _r{rank}, e.g., r16, r32, r64, r128, etc.
...   svdq_kwargs={"smooth_strategy": "few_shot"})) 
>>> output = pipe(...) # Then, just call the pipe as normal.

For more advanced features, please refer to our online documentation at 📘cache-dit.io.

🌐Community Integration

©️Acknowledgements

Special thanks to vipshop's Computer Vision AI Team for supporting testing and deployment of this project. We learned and reused codes from: Diffusers, SGLang, vLLM-Omni, Nunchaku, xDiT and TaylorSeer.

©️Citations

@misc{cache-dit@2025,
  title={Cache-DiT: A PyTorch-native Inference Engine with Cache, Parallelism and Quantization for Diffusion Transformers.},
  url={https://github.com/vipshop/cache-dit.git},
  note={Open-source software available at https://github.com/vipshop/cache-dit.git},
  author={DefTruth, vipshop.com, etc.},
  year={2025}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cache_dit_cu13-1.3.10-cp314-cp314-manylinux_2_34_x86_64.whl (30.9 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

cache_dit_cu13-1.3.10-cp313-cp313-manylinux_2_34_x86_64.whl (30.9 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

cache_dit_cu13-1.3.10-cp312-cp312-manylinux_2_34_x86_64.whl (30.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

cache_dit_cu13-1.3.10-cp311-cp311-manylinux_2_34_x86_64.whl (30.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

cache_dit_cu13-1.3.10-cp310-cp310-manylinux_2_34_x86_64.whl (30.9 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

File details

Details for the file cache_dit_cu13-1.3.10-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for cache_dit_cu13-1.3.10-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 ea364eba8e65ab7b4894467e3aab58447e5a80489e2a09b735fed5fc9491ed36
MD5 587c03204ddd0cc31634e9873933c45a
BLAKE2b-256 ace81db5a7423347765df44712946f0a68e35a83374cf6646a52476fb1af05c9

See more details on using hashes here.

File details

Details for the file cache_dit_cu13-1.3.10-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for cache_dit_cu13-1.3.10-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 0a6868a19c06e69b39635dae7838707c47d82aa7aeb622c19a78c461a1158141
MD5 ed41c95d12b2d51b82d4d4526a989c9d
BLAKE2b-256 38f9a9f94503a2108489ea1e13459129d8a40c299a0420d9a252d2dd70eddc4e

See more details on using hashes here.

File details

Details for the file cache_dit_cu13-1.3.10-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for cache_dit_cu13-1.3.10-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 61685f62a8bc63f3f87df811bb2780df8f08a395879c7a06684b2a22ecbfa5ec
MD5 1b4b34536356b8fb6cabc00df3d2eb58
BLAKE2b-256 000dbe4cddf6f859deb2adee7e39af85ae92c630a4b1c133e00a5edb55f7e5ec

See more details on using hashes here.

File details

Details for the file cache_dit_cu13-1.3.10-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for cache_dit_cu13-1.3.10-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 6c81f88fd0576a7dd26353391ede02612d21a7255c26783fe5ef449201ab91f5
MD5 f3f7001eb10dab81c97acdf036274555
BLAKE2b-256 365d66827dfe307dba3c50000d064d67af74a1f8baa860d8e0dcf94ea392d7a9

See more details on using hashes here.

File details

Details for the file cache_dit_cu13-1.3.10-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for cache_dit_cu13-1.3.10-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 f3afe54475f2e3097866f8012a8b663fdb2716f37c070ec0bdf00c2c6ea5aae2
MD5 cbb8f550d6b6e7f728c7c9322a20ad12
BLAKE2b-256 46ee8d68300785fae5d2e6de6795919b3df42c00dc69134b7c47c48aecd39639

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page