A Unified and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for ๐คDiffusers.
Project description
A Unified and Flexible Inference Engine with ๐ค๐
Hybrid Cache Acceleration and Parallelism for DiTs
๐ฅHightlight
We are excited to announce that the first API-stable version (v1.0.0) of cache-dit has finally been released! cache-dit is a Unified and Flexible Inference Engine for ๐คDiffusers, enabling acceleration with just โฅ๏ธone lineโฅ๏ธ of code. Key features: Unified Cache APIs, Forward Pattern Matching, Automatic Block Adapter, DBCache, DBPrune, Hybrid TaylorSeer Calibrator, Hybrid Cache CFG, Context Parallelism, Tensor Parallelism, Torch Compile Compatible and ๐SOTA performance.
pip3 install -U cache-dit # pip3 install git+https://github.com/vipshop/cache-dit.git
You can install the stable release of cache-dit from PyPI, or the latest development version from GitHub. Then try โฅ๏ธ Cache Acceleration with just one line of code ~ โฅ๏ธ
>>> import cache_dit
>>> from diffusers import DiffusionPipeline
>>> pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image") # Can be any diffusion pipeline
>>> cache_dit.enable_cache(pipe) # One-line code with default cache options.
>>> output = pipe(...) # Just call the pipe as normal.
>>> stats = cache_dit.summary(pipe) # Then, get the summary of cache acceleration stats.
>>> cache_dit.disable_cache(pipe) # Disable cache and run original pipe.
๐Core Features
- ๐Full ๐คDiffusers Support: Notably, cache-dit now supports nearly all of Diffusers' DiT-based pipelines, include 30+ series, nearly 100+ pipelines, such as FLUX.1, Qwen-Image, Qwen-Image-Lightning, Wan 2.1/2.2, HunyuanImage-2.1, HunyuanVideo, HiDream, AuraFlow, CogView3Plus, CogView4, CogVideoX, LTXVideo, ConsisID, SkyReelsV2, VisualCloze, PixArt, Chroma, Mochi, SD 3.5, DiT-XL, etc.
- ๐Extremely Easy to Use: In most cases, you only need one line of code:
cache_dit.enable_cache(...). After calling this API, just use the pipeline as normal. - ๐Easy New Model Integration: Features like Unified Cache APIs, Forward Pattern Matching, Automatic Block Adapter, Hybrid Forward Pattern, and Patch Functor make it highly functional and flexible. For example, we achieved ๐ Day 1 support for HunyuanImage-2.1 with 1.7x speedup w/o precision lossโeven before it was available in the Diffusers library.
- ๐State-of-the-Art Performance: Compared with algorithms including ฮ-DiT, Chipmunk, FORA, DuCa, TaylorSeer and FoCa, cache-dit achieved the SOTA performance w/ 7.4xโ๐ speedup on ClipScore!
- ๐Support for 4/8-Steps Distilled Models: Surprisingly, cache-dit's DBCache works for extremely few-step distilled modelsโsomething many other methods fail to do.
- ๐Compatibility with Other Optimizations: Designed to work seamlessly with torch.compile, Quantization (torchao, ๐ฅnunchaku), CPU or Sequential Offloading, ๐ฅContext Parallelism, ๐ฅTensor Parallelism, etc.
- ๐Hybrid Cache Acceleration: Now supports hybrid Block-wise Cache + Calibrator schemes (e.g., DBCache or DBPrune + TaylorSeerCalibrator). DBCache or DBPrune acts as the Indicator to decide when to cache, while the Calibrator decides how to cache. More mainstream cache acceleration algorithms (e.g., FoCa) will be supported in the future, along with additional benchmarksโstay tuned for updates!
- ๐คDiffusers Ecosystem Integration: ๐ฅcache-dit has joined the Diffusers community ecosystem as the first DiT-specific cache acceleration framework! Check out the documentation here:
๐ฅSupported DiTs
[!Tip] One Model Series may contain many pipelines. cache-dit applies optimizations at the Transformer level; thus, any pipelines that include the supported transformer are already supported by cache-dit. โ : known work and official supported now; โ๏ธ: unofficial supported now, but maybe support in the future; 4-bits: w/ nunchaku + svdq int4.
| ๐Model | Cache | CP | TP | ๐Model | Cache | CP | TP |
|---|---|---|---|---|---|---|---|
| ๐FLUX.1 | โ | โ | โ | ๐FLUX.1 4-bits | โ | โ | โ๏ธ |
| ๐Qwen-Image | โ | โ | โ | ๐Qwen-Image 4-bits | โ | โ | โ๏ธ |
| ๐Qwen...Lightning | โ | โ | โ | ๐Qwen...Lightning 4-bits | โ | โ | โ๏ธ |
| ๐CogVideoX | โ | โ | โ๏ธ | ๐OmniGen | โ | โ๏ธ | โ๏ธ |
| ๐Wan 2.1 | โ | โ | โ | ๐PixArt Sigma | โ | โ | โ๏ธ |
| ๐Wan 2.1 VACE | โ | โ | โ | ๐PixArt Alpha | โ | โ | โ๏ธ |
| ๐Wan 2.2 | โ | โ | โ | ๐CogVideoX 1.5 | โ | โ | โ๏ธ |
| ๐HunyuanVideo | โ | โ | โ | ๐Sana | โ | โ๏ธ | โ๏ธ |
| ๐LTXVideo | โ | โ | โ๏ธ | ๐VisualCloze | โ | โ | โ |
| ๐Allegro | โ | โ๏ธ | โ๏ธ | ๐AuraFlow | โ | โ๏ธ | โ๏ธ |
| ๐CogView4 | โ | โ | โ๏ธ | ๐ShapE | โ | โ๏ธ | โ๏ธ |
| ๐CogView3Plus | โ | โ | โ๏ธ | ๐Chroma | โ | โ | ๏ธโ |
| ๐Cosmos | โ | โ๏ธ | โ๏ธ | ๐HiDream | โ | โ๏ธ | โ๏ธ |
| ๐EasyAnimate | โ | โ๏ธ | โ๏ธ | ๐HunyuanDiT | โ | โ๏ธ | โ |
| ๐SkyReelsV2 | โ | โ๏ธ | โ๏ธ | ๐HunyuanDiTPAG | โ | โ๏ธ | โ๏ธ |
| ๐StableDiffusion3 | โ | โ๏ธ | โ๏ธ | ๐Kandinsky5 | โ | โ๏ธ | โ ๏ธ |
| ๐ConsisID | โ | โ | โ๏ธ | ๐PRX | โ | โ๏ธ | โ๏ธ |
| ๐DiT | โ | โ | โ๏ธ | ๐HunyuanImage | โ | โ | โ |
| ๐Amused | โ | โ๏ธ | โ๏ธ | ๐LongCatVideo | โ | โ๏ธ | โ๏ธ |
| ๐StableAudio | โ | โ๏ธ | โ๏ธ | ๐Bria | โ | โ๏ธ | โ๏ธ |
| ๐Mochi | โ | โ๏ธ | โ | ๐Lumina | โ | โ๏ธ | โ๏ธ |
๐ฅClick here to show many Image/Video cases๐ฅ
๐Now, cache-dit covers almost All Diffusers' DiT Pipelines๐
๐ฅQwen-Image | Qwen-Image-Edit | Qwen-Image-Edit-Plus ๐ฅ
๐ฅFLUX.1 | Qwen-Image-Lightning 4/8 Steps | Wan 2.1 | Wan 2.2 ๐ฅ
๐ฅHunyuanImage-2.1 | HunyuanVideo | HunyuanDiT | HiDream | AuraFlow๐ฅ
๐ฅCogView3Plus | CogView4 | LTXVideo | CogVideoX | CogVideoX 1.5 | ConsisID๐ฅ
๐ฅCosmos | SkyReelsV2 | VisualCloze | OmniGen 1/2 | Lumina 1/2 | PixArt๐ฅ
๐ฅChroma | Sana | Allegro | Mochi | SD 3/3.5 | Amused | ... | DiT-XL๐ฅ
๐ฅWan2.2 MoE | +cache-dit:2.0xโ๐ | HunyuanVideo | +cache-dit:2.1xโ๐
๐ฅQwen-Image | +cache-dit:1.8xโ๐ | FLUX.1-dev | +cache-dit:2.1xโ๐
๐ฅQwen...Lightning | +cache-dit:1.14xโ๐ | HunyuanImage | +cache-dit:1.7xโ๐
๐ฅQwen-Image-Edit | Input w/o Edit | Baseline | +cache-dit:1.6xโ๐ | 1.9xโ๐
๐ฅFLUX-Kontext-dev | Baseline | +cache-dit:1.3xโ๐ | 1.7xโ๐ | 2.0xโ ๐
๐ฅHiDream-I1 | +cache-dit:1.9xโ๐ | CogView4 | +cache-dit:1.4xโ๐ | 1.7xโ๐
๐ฅCogView3 | +cache-dit:1.5xโ๐ | 2.0xโ๐| Chroma1-HD | +cache-dit:1.9xโ๐
๐ฅMochi-1-preview | +cache-dit:1.8xโ๐ | SkyReelsV2 | +cache-dit:1.6xโ๐
๐ฅVisualCloze-512 | Model | Cloth | Baseline | +cache-dit:1.4xโ๐ | 1.7xโ๐
๐ฅLTX-Video-0.9.7 | +cache-dit:1.7xโ๐ | CogVideoX1.5 | +cache-dit:2.0xโ๐
๐ฅOmniGen-v1 | +cache-dit:1.5xโ๐ | 3.3xโ๐ | Lumina2 | +cache-dit:1.9xโ๐
๐ฅAllegro | +cache-dit:1.36xโ๐ | AuraFlow-v0.3 | +cache-dit:2.27xโ๐
๐ฅSana | +cache-dit:1.3xโ๐ | 1.6xโ๐| PixArt-Sigma | +cache-dit:2.3xโ๐
๐ฅPixArt-Alpha | +cache-dit:1.6xโ๐ | 1.8xโ๐| SD 3.5 | +cache-dit:2.5xโ๐
๐ฅAsumed | +cache-dit:1.1xโ๐ | 1.2xโ๐ | DiT-XL-256 | +cache-dit:1.8xโ๐
โฅ๏ธ Please consider to leave a โญ๏ธ Star to support us ~ โฅ๏ธ
๐Table of Contents
For more advanced features such as Unified Cache APIs, Forward Pattern Matching, Automatic Block Adapter, Hybrid Forward Pattern, Patch Functor, DBCache, DBPrune, TaylorSeer Calibrator, Hybrid Cache CFG, Context Parallelism and Tensor Parallelism, please refer to the ๐User_Guide.md for details.
- โ๏ธInstallation
- ๐ฅSupported DiTs
- ๐ฅBenchmarks
- ๐Unified Cache APIs
- โก๏ธDBCache: Dual Block Cache
- โก๏ธDBPrune: Dynamic Block Prune
- โก๏ธHybrid Cache CFG
- ๐ฅHybrid TaylorSeer Calibrator
- โก๏ธHybrid Context Parallelism
- โก๏ธHybrid Tensor Parallelism
- ๐คLow-bits Quantization
- ๐ Metrics Command Line
- โ๏ธTorch Compile
- ๐API Documents
๐Contribute
How to contribute? Star โญ๏ธ this repo to support us or check CONTRIBUTE.md.
๐Projects Using CacheDiT
Here is a curated list of open-source projects integrating CacheDiT, including popular repositories like jetson-containers, flux-fast, and sdnext. ๐CacheDiT has been recommended by: Wan 2.2, Qwen-Image-Lightning, Qwen-Image, LongCat-Video, Kandinsky-5, ๐คdiffusers and HelloGitHub, among others.
ยฉ๏ธAcknowledgements
Special thanks to vipshop's Computer Vision AI Team for supporting document, testing and production-level deployment of this project.
ยฉ๏ธCitations
@misc{cache-dit@2025,
title={cache-dit: A Unified and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for Diffusers.},
url={https://github.com/vipshop/cache-dit.git},
note={Open-source software available at https://github.com/vipshop/cache-dit.git},
author={DefTruth, vipshop.com},
year={2025}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cache_dit-1.0.15-py3-none-any.whl.
File metadata
- Download URL: cache_dit-1.0.15-py3-none-any.whl
- Upload date:
- Size: 171.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f4514f7f16f2a121406df1ab61734daa473bc4693b5446e9287466df3e422c6
|
|
| MD5 |
ae0b12e455d8974035bb115d6962b361
|
|
| BLAKE2b-256 |
bbf351f89f8255dec22987209a25bd1f055ec2e1651840147249d062d2e317a3
|