A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for ๐คDiTs.
Project description
A PyTorch-native and Flexible Inference Engine with
Hybrid Cache Acceleration and Parallelism for ๐คDiTs
| Baseline | SCM S S* | SCM F D* | SCM U D* | +TS | +compile | +FP8* |
|---|---|---|---|---|---|---|
| 24.85s | 15.4s | 11.4s | 8.2s | 8.2s | ๐7.1s | ๐4.5s |
Scheme: DBCache + SCM(steps_computation_mask) + TS(TaylorSeer) + FP8*, L20x1, S*: static cache,
D*: dynamic cache, S: Slow, F: Fast, U: Ultra Fast, TS: TaylorSeer, FP8*: FP8 DQ + Sage, FLUX.1-Dev
U*: Ulysses Attention, UAA: Ulysses Anything Attenton, UAA*: UAA + Gloo, Device: NVIDIA L20
FLUX.1-Dev w/o CPU Offload, 28 steps; Qwen-Image w/ CPU Offload, 50 steps; Gloo: Extra All Gather w/ Gloo
| CP2 U* | CP2 UAA* | L20x1 | CP2 UAA* | CP2 U* | L20x1 | CP2 UAA* |
|---|---|---|---|---|---|---|
| FLUX, 13.87s | ๐13.88s | 23.25s | ๐13.75s | Qwen, 132s | 181s | ๐133s |
| 1024x1024 | 1024x1024 | 1008x1008 | 1008x1008 | 1312x1312 | 1328x1328 | 1328x1328 |
| โ๏ธU* โ๏ธUAA | โ๏ธU* โ๏ธUAA | NO CP | โU* โ๏ธUAA | โ๏ธU* โ๏ธUAA | NO CP | โU* โ๏ธUAA |
๐ฅHightlight
We are excited to announce that the ๐v1.1.0 version of cache-dit has finally been released! It brings ๐ฅContext Parallelism and ๐ฅTensor Parallelism to cache-dit, thus making it a PyTorch-native and Flexible Inference Engine for ๐คDiTs. Key features: Unified Cache APIs, Forward Pattern Matching, Block Adapter, DBCache, DBPrune, Cache CFG, TaylorSeer, SCM, Context Parallelism (w/ UAA), Tensor Parallelism and ๐SOTA performance.
pip3 install -U cache-dit # Also, pip3 install git+https://github.com/huggingface/diffusers.git (latest)
You can install the stable release of cache-dit from PyPI, or the latest development version from GitHub. Then try โฅ๏ธ Cache Acceleration with just one line of code ~ โฅ๏ธ
>>> import cache_dit
>>> from diffusers import DiffusionPipeline
>>> pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image") # Can be any diffusion pipeline
>>> cache_dit.enable_cache(pipe) # One-line code with default cache options.
>>> output = pipe(...) # Just call the pipe as normal.
>>> stats = cache_dit.summary(pipe) # Then, get the summary of cache acceleration stats.
>>> cache_dit.disable_cache(pipe) # Disable cache and run original pipe.
๐Core Features
- ๐Full ๐คDiffusers Support: Notably, cache-dit now supports nearly all of Diffusers' DiTs, include 60+ models, ~100+ pipelines: ๐ฅFLUX, ๐ฅQwen-Image, ๐ฅZ-image, ๐ฅLongCat-Image, ๐ฅWan, etc.
- ๐Extremely Easy to Use: In most cases, you only need one line of code:
cache_dit.enable_cache(...). After calling this API, just use the pipeline as normal. - ๐State-of-the-Art Performance: Compared with other algorithms, cache-dit achieved the SOTA w/ 7.4xโ๐ speedup on ClipScore! Surprisingly, it's DBCache also works for extremely few-step distilled models.
- ๐Compatibility with Other Optimizations: Designed to work seamlessly with torch.compile, Quantization, CPU or Sequential Offloading, Context Parallelism, Tensor Parallelism, etc.
- ๐Hybrid Cache Acceleration: Now supports hybrid Block-wise Cache + Calibrator schemes. DBCache acts as the Indicator to decide when to cache, while the Calibrator decides how to cache.
- ๐Ecosystem Integration: Joined the Diffusers community as the first DiTs' cache acceleration framework for ๐คdiffusers, ๐ฅSGLang Diffusion, ๐ฅvLLM-Omni, ๐ฅstable-diffusion.cpp, ๐ฅnunchaku and ๐ฅsdnext.
- ๐HTTP Serving Support: Built-in HTTP serving capabilities for production deployment with simple REST API. Supports text-to-image, image editing, text/image-to-video, and LoRA.
๐ฅSupported DiTs
[!Tip]
One Model Series may contain many pipelines. cache-dit applies optimizations at the Transformer level; thus, any pipelines that include the supported transformer are already supported by cache-dit. โ : supported now; โ๏ธ: not supported now; ๐คQ: nunchaku w/ SVDQ W4A4; C-P: Context Parallelism; T-P: Tensor Parallelism; TE-P: Text Encoder Parallelism; CN-P: ControlNet Parallelism; VAE-P: VAE Parallelism (TODO).
๐Supported DiTs: ๐ค65+ |
Cache | C-P | T-P | TE-P | CN-P | VAE-P |
|---|---|---|---|---|---|---|
Z-Image-Turbo ๐คQ |
โ | โ | โ๏ธ | โ | โ๏ธ | โ๏ธ |
| Qwen-Image-Layered | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| Qwen-Image-Edit-2511-Lightning | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| Qwen-Image-Edit-2511 | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| LongCat-Image | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| LongCat-Image-Edit | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| Z-Image-Turbo | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| Z-Image-Turbo-Fun-ControlNet-2.0 | โ | โ | โ | โ | โ | โ๏ธ |
| Z-Image-Turbo-Fun-ControlNet-2.1 | โ | โ | โ | โ | โ | โ๏ธ |
| Ovis-Image | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| FLUX.2-dev | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| FLUX.1-dev | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| FLUX.1-Fill-dev | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| FLUX.1-Kontext-dev | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| Qwen-Image | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| Qwen-Image-Edit | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| Qwen-Image-Edit-2509 | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| Qwen-Image-ControlNet | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| Qwen-Image-ControlNet-Inpainting | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| Qwen-Image-Lightning | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| Qwen-Image-Edit-Lightning | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| Qwen-Image-Edit-2509-Lightning | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| Wan-2.2-T2V | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| Wan-2.2-ITV | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| Wan-2.2-VACE-Fun | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| Wan-2.1-T2V | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| Wan-2.1-ITV | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| Wan-2.1-FLF2V | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| Wan-2.1-VACE | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| HunyuanImage-2.1 | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| HunyuanVideo-1.5 | โ | โ๏ธ | โ๏ธ | โ | โ๏ธ | โ๏ธ |
| HunyuanVideo | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
FLUX.1-dev ๐คQ |
โ | โ | โ๏ธ | โ | โ๏ธ | โ๏ธ |
FLUX.1-Fill-dev ๐คQ |
โ | โ | โ๏ธ | โ | โ๏ธ | โ๏ธ |
FLUX.1-Kontext-dev ๐คQ |
โ | โ | โ๏ธ | โ | โ๏ธ | โ๏ธ |
Qwen-Image ๐คQ |
โ | โ | โ๏ธ | โ | โ๏ธ | โ๏ธ |
Qwen-Image-Edit ๐คQ |
โ | โ | โ๏ธ | โ | โ๏ธ | โ๏ธ |
Qwen-Image-Edit-2509 ๐คQ |
โ | โ | โ๏ธ | โ | โ๏ธ | โ๏ธ |
Qwen-Image-Lightning ๐คQ |
โ | โ | โ๏ธ | โ | โ๏ธ | โ๏ธ |
Qwen-Image-Edit-Lightning ๐คQ |
โ | โ | โ๏ธ | โ | โ๏ธ | โ๏ธ |
Qwen-Image-Edit-2509-Lightning ๐คQ |
โ | โ | โ๏ธ | โ | โ๏ธ | โ๏ธ |
| SkyReels-V2-T2V | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| LongCat-Video | โ | โ๏ธ | โ๏ธ | โ | โ๏ธ | โ๏ธ |
| ChronoEdit-14B | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| Kandinsky-5.0-T2V-Lite | โ | โ ๏ธ | โ ๏ธ | โ | โ๏ธ | โ๏ธ |
| PRX-512-t2i-sft | โ | โ๏ธ | โ๏ธ | โ | โ๏ธ | โ๏ธ |
| LTX-Video-v0.9.8 | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| LTX-Video-v0.9.7 | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| CogVideoX | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| CogVideoX-1.5 | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| CogView-4 | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| CogView-3-Plus | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| Chroma1-HD | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| PixArt-Sigma-XL-2-1024-MS | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| PixArt-XL-2-1024-MS | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| VisualCloze-512 | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| ConsisID-preview | โ | โ | โ | โ | โ๏ธ | โ๏ธ |
| mochi-1-preview | โ | โ๏ธ | โ | โ | โ๏ธ | โ๏ธ |
| Lumina-Image-2.0 | โ | โ๏ธ | โ | โ | โ๏ธ | โ๏ธ |
| HiDream-I1-Full | โ | โ๏ธ | โ๏ธ | โ | โ๏ธ | โ๏ธ |
| HunyuanDiT | โ | โ๏ธ | โ | โ | โ๏ธ | โ๏ธ |
| Sana-1600M-1024px | โ | โ๏ธ | โ๏ธ | โ | โ๏ธ | โ๏ธ |
| DiT-XL-2-256 | โ | โ | โ๏ธ | โ | โ๏ธ | โ๏ธ |
| Allegro-T2V | โ | โ๏ธ | โ๏ธ | โ | โ๏ธ | โ๏ธ |
| OmniGen-2 | โ | โ๏ธ | โ๏ธ | โ | โ๏ธ | โ๏ธ |
| stable-diffusion-3.5-large | โ | โ๏ธ | โ๏ธ | โ | โ๏ธ | โ๏ธ |
| Amused-512 | โ | โ๏ธ | โ๏ธ | โ | โ๏ธ | โ๏ธ |
| AuraFlow | โ | โ๏ธ | โ๏ธ | โ | โ๏ธ | โ๏ธ |
๐ฅClick here to show many Image/Video cases๐ฅ
๐Now, cache-dit covers almost All Diffusers' DiT Pipelines๐
๐ฅQwen-Image | Qwen-Image-Edit | Qwen-Image-Edit-Plus ๐ฅ
๐ฅFLUX.1 | Qwen-Image-Lightning 4/8 Steps | Wan 2.1 | Wan 2.2 ๐ฅ
๐ฅHunyuanImage-2.1 | HunyuanVideo | HunyuanDiT | HiDream | AuraFlow๐ฅ
๐ฅCogView3Plus | CogView4 | LTXVideo | CogVideoX | CogVideoX 1.5 | ConsisID๐ฅ
๐ฅCosmos | SkyReelsV2 | VisualCloze | OmniGen 1/2 | Lumina 1/2 | PixArt๐ฅ
๐ฅChroma | Sana | Allegro | Mochi | SD 3/3.5 | Amused | ... | DiT-XL๐ฅ
๐ฅWan2.2 MoE | +cache-dit:2.0xโ๐ | HunyuanVideo | +cache-dit:2.1xโ๐
๐ฅQwen-Image | +cache-dit:1.8xโ๐ | FLUX.1-dev | +cache-dit:2.1xโ๐
๐ฅQwen...Lightning | +cache-dit:1.14xโ๐ | HunyuanImage | +cache-dit:1.7xโ๐
๐ฅQwen-Image-Edit | Input w/o Edit | Baseline | +cache-dit:1.6xโ๐ | 1.9xโ๐
๐ฅFLUX-Kontext-dev | Baseline | +cache-dit:1.3xโ๐ | 1.7xโ๐ | 2.0xโ ๐
๐ฅHiDream-I1 | +cache-dit:1.9xโ๐ | CogView4 | +cache-dit:1.4xโ๐ | 1.7xโ๐
๐ฅCogView3 | +cache-dit:1.5xโ๐ | 2.0xโ๐| Chroma1-HD | +cache-dit:1.9xโ๐
๐ฅMochi-1-preview | +cache-dit:1.8xโ๐ | SkyReelsV2 | +cache-dit:1.6xโ๐
๐ฅVisualCloze-512 | Model | Cloth | Baseline | +cache-dit:1.4xโ๐ | 1.7xโ๐
๐ฅLTX-Video-0.9.7 | +cache-dit:1.7xโ๐ | CogVideoX1.5 | +cache-dit:2.0xโ๐
๐ฅOmniGen-v1 | +cache-dit:1.5xโ๐ | 3.3xโ๐ | Lumina2 | +cache-dit:1.9xโ๐
๐ฅAllegro | +cache-dit:1.36xโ๐ | AuraFlow-v0.3 | +cache-dit:2.27xโ๐
๐ฅSana | +cache-dit:1.3xโ๐ | 1.6xโ๐| PixArt-Sigma | +cache-dit:2.3xโ๐
๐ฅPixArt-Alpha | +cache-dit:1.6xโ๐ | 1.8xโ๐| SD 3.5 | +cache-dit:2.5xโ๐
๐ฅAsumed | +cache-dit:1.1xโ๐ | 1.2xโ๐ | DiT-XL-256 | +cache-dit:1.8xโ๐
โฅ๏ธ Please consider to leave a โญ๏ธ Star to support us ~ โฅ๏ธ
๐Table of Contents
๐Quick Links
- ๐Examples - The easiest way to enable hybrid cache acceleration and parallelism for DiTs with cache-dit is to start with our examples for popular models: FLUX, Z-Image, Qwen-Image, Wan, etc.
- ๐HTTP Serving - Deploy cache-dit models with HTTP API for text-to-image, image editing, multi-image editing, and text/image-to-video generation.
- ๐User Guide - For more advanced features, please refer to the ๐User_Guide.md for details.
- โFAQ - Frequently asked questions including attention backend configuration, troubleshooting, and optimization tips.
๐Documentation
- โ๏ธInstallation
- ๐ฅSupported DiTs
- ๐ฅBenchmarks
- ๐Unified Cache APIs
- โก๏ธDBCache: Dual Block Cache
- โก๏ธDBPrune: Dynamic Block Prune
- โก๏ธHybrid Cache CFG
- ๐ฅHybrid TaylorSeer Calibrator
- ๐คSCM: Steps Computation Masking
- โก๏ธHybrid Context Parallelism
- ๐คUAA: Ulysses Anything Attention
- ๐คAsync Ulysses QKV Projection
- ๐คAsync FP8 Ulysses Attention
- โก๏ธHybrid Tensor Parallelism
- ๐คParallelize Text Encoder
- ๐คLow-bits Quantization
- ๐คHow to use FP8 Attention
- ๐ Metrics Command Line
- โ๏ธTorch Compile
- ๐Torch Profiler Usage
- ๐API Documents
๐Contribute
How to contribute? Star โญ๏ธ this repo to support us or check CONTRIBUTE.md.
๐Projects Using CacheDiT
Here is a curated list of open-source projects integrating CacheDiT, including popular repositories like jetson-containers, flux-fast, ๐ฅsdnext, ๐ฅstable-diffusion.cpp, ๐ฅnunchaku, ๐ฅvLLM-Omni, and ๐ฅSGLang Diffusion. ๐CacheDiT has been recommended by many famous opensource projects: ๐ฅZ-Image, ๐ฅWan 2.2, ๐ฅQwen-Image, ๐ฅLongCat-Video, Qwen-Image-Lightning, Kandinsky-5, LeMiCa, ๐คdiffusers, HelloGitHub and GiantPandaLLM.
ยฉ๏ธAcknowledgements
Special thanks to vipshop's Computer Vision AI Team for supporting document, testing and production-level deployment of this project. We learned the design and reused code from the following projects: ๐คdiffusers, SGLang, ParaAttention, xDiT, TaylorSeer and LeMiCa.
ยฉ๏ธCitations
@misc{cache-dit@2025,
title={cache-dit: A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.},
url={https://github.com/vipshop/cache-dit.git},
note={Open-source software available at https://github.com/vipshop/cache-dit.git},
author={DefTruth, vipshop.com},
year={2025}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cache_dit-1.1.10-py3-none-any.whl.
File metadata
- Download URL: cache_dit-1.1.10-py3-none-any.whl
- Upload date:
- Size: 267.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f697410cc81f519bd7833f1f4f1f3d9d94fad768f049790b43d0e701e5d3d04
|
|
| MD5 |
61f1f1d1c8f4db3b17bb50c8eca2558c
|
|
| BLAKE2b-256 |
c9d2d61f814de0871ccb460443f73fd8e0c4d9f3aa033f5c2be4e38cb97e959b
|