Skip to main content

Cache-DiT: A PyTorch-native Inference Engine with Cache, Parallelism and Quantization for Diffusion Transformers.

Project description

⚡️🎉A PyTorch-native Inference Engine with Cache,
Parallelism, Quantization for Diffusion Transformers
Featured|HelloGitHub

🤗Why Cache-DiT❓❓Cache-DiT is built on top of the 🤗Diffusers library and now supports nearly ALL DiTs from Diffusers. It provides hybrid cache acceleration (DBCache, TaylorSeer, SCM, etc.) and comprehensive parallelism optimizations, including Context Parallelism, Tensor Parallelism, hybrid 2D or 3D parallelism, and dedicated extra parallelism support for Text Encoder, VAE, and ControlNet.

Cache-DiT is compatible with compilation, CPU Offloading, and quantization, fully integrates with SGLang Diffusion, vLLM-Omni, TensorRT-LLM, ComfyUI, and runs natively on NVIDIA GPUs, Ascend NPUs and AMD GPUs. Cache-DiT is fast, easy to use, and flexible for various DiTs (online docs at 📘cache-dit.io).

⚡️9x speedup by Cache-DiT with Cache, Context Parallelism and Compilation

🚀Quick Start: Cache, Parallelism and Quantization

First, you can install the cache-dit from PyPI or install from source:

uv pip install -U cache-dit # PyPI, stable release.
uv pip install git+https://github.com/vipshop/cache-dit.git # latest

Then, try to accelerate your DiTs with just ♥️one line♥️ of code ~

>>> import cache_dit
>>> from diffusers import DiffusionPipeline
>>> pipe = DiffusionPipeline.from_pretrained(...).to("cuda")
>>> cache_dit.enable_cache(pipe) # Cache Acceleration with One-line code.
>>> from cache_dit import DBCacheConfig, ParallelismConfig
>>> cache_dit.enable_cache( # Or, Hybrid Cache Acceleration + Parallelism.
...   pipe, cache_config=DBCacheConfig(), # w/ default
...   parallelism_config=ParallelismConfig(ulysses_size=2))
>>> from cache_dit import DBCacheConfig, ParallelismConfig, QuantizeConfig
>>> cache_dit.enable_cache( # Or, Hybrid Cache + Parallelism + Quantization.
...   pipe, cache_config=DBCacheConfig(), # w/ default
...   parallelism_config=ParallelismConfig(ulysses_size=2),
...   quantize_config=QuantizeConfig(quant_type=...))
>>> output = pipe(...) # Then, just call the pipe as normal.

🚀Quick Start: SVDQuant (W4A4) PTQ/DQ workflow

First, install Cache-DiT with SVDQuant support (Experimental):

# Required: CUDA 13.0+, PyTorch 2.11+, Ubuntu 22.04+ (GLIBC 2.32+).
uv pip install -U cache-dit-cu13 # PyPI, stable release with SVDQ.
CACHE_DIT_BUILD_SVDQUANT=1 uv pip install -e ".[quantization]" # latest

Then, try to quantize your model with just ♥️a few lines♥️ of codes ~

>>> from cache_dit import QuantizeConfig
>>> pipe = DiffusionPipeline.from_pretrained(...).to("cuda")
>>> # Apply quantization with `cache_dit.quantize(...)` API.
>>> pipe.transformer = cache_dit.quantize(
...   pipe.transformer, quant_config=QuantizeConfig(
...   quant_type="svdq_{int4|nvfp4}_r128_dq", # _r{rank}, e.g., r16, r32, r64, r128, etc.
...   svdq_kwargs={"smooth_strategy": "few_shot"})) 
>>> output = pipe(...) # Then, just call the pipe as normal.

For more advanced features, please refer to our online documentation at 📘cache-dit.io.

🌐Community Integration

©️Acknowledgements

Special thanks to vipshop's Computer Vision AI Team for supporting testing and deployment of this project. We learned and reused codes from: Diffusers, SGLang, vLLM-Omni, Nunchaku, xDiT and TaylorSeer.

©️Citations

@misc{cache-dit@2025,
  title={Cache-DiT: A PyTorch-native Inference Engine with Cache, Parallelism and Quantization for Diffusion Transformers.},
  url={https://github.com/vipshop/cache-dit.git},
  note={Open-source software available at https://github.com/vipshop/cache-dit.git},
  author={DefTruth, vipshop.com, etc.},
  year={2025}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cache_dit_cu13-1.3.11-cp314-cp314-manylinux_2_38_x86_64.whl (30.9 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.38+ x86-64

cache_dit_cu13-1.3.11-cp314-cp314-manylinux_2_32_x86_64.whl (30.9 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.32+ x86-64

cache_dit_cu13-1.3.11-cp313-cp313-manylinux_2_38_x86_64.whl (30.9 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.38+ x86-64

cache_dit_cu13-1.3.11-cp313-cp313-manylinux_2_32_x86_64.whl (30.9 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.32+ x86-64

cache_dit_cu13-1.3.11-cp312-cp312-manylinux_2_38_x86_64.whl (30.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.38+ x86-64

cache_dit_cu13-1.3.11-cp312-cp312-manylinux_2_32_x86_64.whl (30.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.32+ x86-64

cache_dit_cu13-1.3.11-cp311-cp311-manylinux_2_38_x86_64.whl (30.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.38+ x86-64

cache_dit_cu13-1.3.11-cp311-cp311-manylinux_2_32_x86_64.whl (30.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.32+ x86-64

cache_dit_cu13-1.3.11-cp310-cp310-manylinux_2_38_x86_64.whl (30.9 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.38+ x86-64

cache_dit_cu13-1.3.11-cp310-cp310-manylinux_2_32_x86_64.whl (30.9 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.32+ x86-64

File details

Details for the file cache_dit_cu13-1.3.11-cp314-cp314-manylinux_2_38_x86_64.whl.

File metadata

File hashes

Hashes for cache_dit_cu13-1.3.11-cp314-cp314-manylinux_2_38_x86_64.whl
Algorithm Hash digest
SHA256 9c57d6975733883e1ce8a7b59fb92d945c8efb5e6f251b0576adeca2ea31e43f
MD5 df79cc92e447f50df0c9f4c4e1178d73
BLAKE2b-256 b7afdc02e91ab414012cf3739fbca82626f5ed61ca52970826d527f0c3d7fefb

See more details on using hashes here.

File details

Details for the file cache_dit_cu13-1.3.11-cp314-cp314-manylinux_2_32_x86_64.whl.

File metadata

File hashes

Hashes for cache_dit_cu13-1.3.11-cp314-cp314-manylinux_2_32_x86_64.whl
Algorithm Hash digest
SHA256 5c1be6880f7033f5482858e4fb90175ece04ec7e07e11b0e99fe977d5150cf70
MD5 a3d25c2b11f6c5b434e61af64667bb74
BLAKE2b-256 b19b18e7d5cf47bd5d0b3a6cb53a5b46133f80ca39d8337275103f199a0bd270

See more details on using hashes here.

File details

Details for the file cache_dit_cu13-1.3.11-cp313-cp313-manylinux_2_38_x86_64.whl.

File metadata

File hashes

Hashes for cache_dit_cu13-1.3.11-cp313-cp313-manylinux_2_38_x86_64.whl
Algorithm Hash digest
SHA256 cae40f4f4544cb9a43acd767220c99f402c6d8f4e1c9d006c87f049e66e376fd
MD5 03cae24ca6b2e616242836bfb768ec82
BLAKE2b-256 6da9e87e82dcf61d3f0f0c9aa60b6bd34a20a8abe2ae52494d79218356ccffa1

See more details on using hashes here.

File details

Details for the file cache_dit_cu13-1.3.11-cp313-cp313-manylinux_2_32_x86_64.whl.

File metadata

File hashes

Hashes for cache_dit_cu13-1.3.11-cp313-cp313-manylinux_2_32_x86_64.whl
Algorithm Hash digest
SHA256 4dd6c5d9ffa4be545855d476bd1f35c4973698362c2555216d3ca5b9e880fd04
MD5 c090387a022487131a25c59159ccebcc
BLAKE2b-256 c5499e436ac0f7ea26fe06fe4e85fd326079caf4e1b8315055d6372aff8a6720

See more details on using hashes here.

File details

Details for the file cache_dit_cu13-1.3.11-cp312-cp312-manylinux_2_38_x86_64.whl.

File metadata

File hashes

Hashes for cache_dit_cu13-1.3.11-cp312-cp312-manylinux_2_38_x86_64.whl
Algorithm Hash digest
SHA256 580bc81c4c7681affd0ed434d1e2d47f19cff3277028bebdb44bde50530d2062
MD5 3c9987742d6c46035abd382e262f955b
BLAKE2b-256 f582abce72a79e0e8a2ffb87d9bfcbb347b91149036150e54fd9a20ae2bb1292

See more details on using hashes here.

File details

Details for the file cache_dit_cu13-1.3.11-cp312-cp312-manylinux_2_32_x86_64.whl.

File metadata

File hashes

Hashes for cache_dit_cu13-1.3.11-cp312-cp312-manylinux_2_32_x86_64.whl
Algorithm Hash digest
SHA256 8c201c837c00d84d76450f6f6dec89f6fccfb736071060e46fd430ecfa1a4f21
MD5 9fc747dec99cdeb63e565fac8cb670ba
BLAKE2b-256 f2bcf0c656a221e77245aa93e1e087cb94f59c416d16878c1e6ca0a6bedf12fe

See more details on using hashes here.

File details

Details for the file cache_dit_cu13-1.3.11-cp311-cp311-manylinux_2_38_x86_64.whl.

File metadata

File hashes

Hashes for cache_dit_cu13-1.3.11-cp311-cp311-manylinux_2_38_x86_64.whl
Algorithm Hash digest
SHA256 4bca15828c6dabb70b0bc3d7f793784c5ff03e1b445b6d0ec3ea44e774f6d73e
MD5 2ebb238dddc0e1dfb8dc1d3dd3a780c5
BLAKE2b-256 4a31a1b49a9bd23fa889ebe1582fc603da94d537c39989c4787af59a7c91d530

See more details on using hashes here.

File details

Details for the file cache_dit_cu13-1.3.11-cp311-cp311-manylinux_2_32_x86_64.whl.

File metadata

File hashes

Hashes for cache_dit_cu13-1.3.11-cp311-cp311-manylinux_2_32_x86_64.whl
Algorithm Hash digest
SHA256 e4cad792596ed09fa56f98a19c1f94b42450d6b3a83b9f2542d377abb7fe197a
MD5 d4e7b492138b4e3b1220d7cb1e1147a5
BLAKE2b-256 d63e2cd806738b948f0682a73ae8ef07c254be931d7d54390acf129ac7fa7ebc

See more details on using hashes here.

File details

Details for the file cache_dit_cu13-1.3.11-cp310-cp310-manylinux_2_38_x86_64.whl.

File metadata

File hashes

Hashes for cache_dit_cu13-1.3.11-cp310-cp310-manylinux_2_38_x86_64.whl
Algorithm Hash digest
SHA256 c532b94875101cb107e40426a09e70245aaee034bcd15481c61a043b8a7643c1
MD5 9f04bda716423966c220cd99660cf6bb
BLAKE2b-256 98c5f8067bebbbf23252551ad4c33257aca8e7499c9633b5cfb2f89062e69c60

See more details on using hashes here.

File details

Details for the file cache_dit_cu13-1.3.11-cp310-cp310-manylinux_2_32_x86_64.whl.

File metadata

File hashes

Hashes for cache_dit_cu13-1.3.11-cp310-cp310-manylinux_2_32_x86_64.whl
Algorithm Hash digest
SHA256 492d49c893491c807440a78d8bfd7506cff37b82c5540563f4feb525c1d9743d
MD5 92ed7c773c2cc63fde3cd0de9d4968a1
BLAKE2b-256 00020a9140447de8e7372fa62eb0afeb67daf2f85dc0b8e1efb6b5c6ec9de356

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page