A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for 🤗DiTs.
Project description
A PyTorch-native and Flexible Inference Engine with
Hybrid Cache Acceleration and Parallelism for 🤗DiTs
🤗Why Cache-DiT❓❓Cache-DiT is built on top of the Diffusers library and now supports nearly 🔥ALL DiTs from Diffusers (online docs at 📘readthedocs.io). The optimizations made by Cache-DiT include:
- 🎉Hybrid Cache Acceleration (DBCache, DBPrune, TaylorSeer, SCM, Cache CFG and more)
- 🎉Context Parallelism (CP w/ Ulysses, Ring, USP, Ulysses Anything, Async CP, FP8 Comm)
- 🎉Tensor Parallelism (TP w/ PyTorch native DTensor and Tensor Parallelism APIs, avoid OOM)
- 🎉Hybrid 2D and 3D Parallelism (w/ 💥USP + TP, scale up the performance of Large DiTs)
- 🎉Text Encoder Parallelism (TE-P w/ PyTorch native DTensor and Tensor Parallelism APIs)
- 🎉Auto Encoder Parallelism (VAE-P w/ Data or Tile Parallelism, slightly faster, avoid OOM)
- 🎉ControlNet Parallelism (CN-P w/ Context Parallelism for some ControlNet modules)
- 🎉Compatible with Compile, CPU Offloading, Quantization (TorchAo, nunchaku), ...
- 🎉Fully integrated into vLLM-Omni, SGLang Diffusion, SD.Next, ComfyUI, ...
- 🎉Natively supports NVIDIA GPUs, Ascend NPUs (>= 1.2.0), ...
🔥Latest News
- [2026/02] 🎉v1.2.1 release is ready, the major updates including: Ring Attention w/ batched P2P, USP (Hybrid Ring and Ulysses), Hybrid 2D and 3D Parallelism (💥USP + TP), VAE-P Comm overhead reduce.
- [2026/01] 🎉v1.2.0 stable release is ready: New Models Support(Z-Image, FLUX.2, LTX-2, etc), Request level Cache Context, HTTP Serving, Ulysses Anything, TE-P, VAE-P, CN-P and Ascend NPUs support.
🚀Quick Start
You can install the cache-dit from PyPI or from source:
pip3 install -U cache-dit # or, pip3 install git+https://github.com/vipshop/cache-dit.git
Then accelerate your DiTs with just ♥️one line♥️ of code ~
>>> import cache_dit
>>> from diffusers import DiffusionPipeline
>>> # The pipe can be any diffusion pipeline.
>>> pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image")
>>> # Cache Acceleration with One-line code.
>>> cache_dit.enable_cache(pipe)
>>> # Or, Hybrid Cache Acceleration + 1D Parallelism.
>>> from cache_dit import DBCacheConfig, ParallelismConfig
>>> cache_dit.enable_cache(
... pipe, cache_config=DBCacheConfig(), # w/ default
... parallelism_config=ParallelismConfig(ulysses_size=2))
>>> # Or, Use Distributed Inference without Cache Acceleration.
>>> cache_dit.enable_cache(
... pipe, parallelism_config=ParallelismConfig(ulysses_size=2))
>>> # Or, Hybrid Cache Acceleration + 2D Parallelism.
>>> cache_dit.enable_cache(
... pipe, cache_config=DBCacheConfig(), # w/ default
... parallelism_config=ParallelismConfig(ulysses_size=2, tp_size=2))
>>> from cache_dit import load_configs
>>> # Or, Load Acceleration config from a custom yaml file.
>>> cache_dit.enable_cache(pipe, **load_configs("config.yaml"))
>>> # Optional, set attention backend for better performance.
>>> cache_dit.set_attn_backend(pipe, attention_backend=...)
>>> output = pipe(...) # Just call the pipe as normal.
For more advanced features, please refer to our online documentation at 📘readthedocs.io.
🌐Community Integration
- 🎉ComfyUI x Cache-DiT
- 🎉Ascend NPU x Cache-DiT
- 🎉Diffusers x Cache-DiT
- 🎉SGLang Diffusion x Cache-DiT
- 🎉vLLM-Omni x Cache-DiT
- 🎉Nunchaku x Cache-DiT
- 🎉SD.Next x Cache-DiT
- 🎉stable-diffusion.cpp x Cache-DiT
- 🎉jetson-containers x Cache-DiT
©️Acknowledgements
Special thanks to vipshop's Computer Vision AI Team for supporting document, testing and deployment of this project. We learned the design and reused code from the following projects: Diffusers, SGLang, vLLM, vLLM-Omni, ParaAttention, xDiT and TaylorSeer.
©️Citations
@misc{cache-dit@2025,
title={cache-dit: A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.},
url={https://github.com/vipshop/cache-dit.git},
note={Open-source software available at https://github.com/vipshop/cache-dit.git},
author={DefTruth, vipshop.com},
year={2025}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cache_dit-1.2.3-py3-none-any.whl.
File metadata
- Download URL: cache_dit-1.2.3-py3-none-any.whl
- Upload date:
- Size: 347.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c1e80844728d55d56d2506e1f7b53315a195595b3bcb13f0d0a0e4c415e0879a
|
|
| MD5 |
b9a9937047de09c88705a5ff8df8ad80
|
|
| BLAKE2b-256 |
cccc5467334e62bd41a40fb08a21e33fe1f00721a1c438bc56d6b4c57e822e53
|