A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for ๐คDiTs.
Project description
A PyTorch-native and Flexible Inference Engine with
Hybrid Cache Acceleration and Parallelism for ๐คDiTs
| Baseline | SCM S S* | SCM F D* | SCM U D* | +TS | +compile | +FP8* |
|---|---|---|---|---|---|---|
| 24.85s | 15.4s | 11.4s | 8.2s | 8.2s | ๐7.1s | ๐4.5s |
Scheme: DBCache + SCM(steps_computation_mask) + TS(TaylorSeer) + FP8*, L20x1, S*: static cache,
D*: dynamic cache, S: Slow, F: Fast, U: Ultra Fast, TS: TaylorSeer, FP8*: FP8 DQ + Sage, FLUX.1-Dev
๐ฅHightlight
We are excited to announce that the ๐v1.1.0 version of cache-dit has finally been released! It brings ๐ฅContext Parallelism and ๐ฅTensor Parallelism to cache-dit, thus making it a PyTorch-native and Flexible Inference Engine for ๐คDiTs. Key features: Unified Cache APIs, Forward Pattern Matching, Block Adapter, DBCache, DBPrune, Cache CFG, TaylorSeer, SCM, Context Parallelism (w/ UAA), Tensor Parallelism and ๐SOTA performance.
pip3 install -U cache-dit # Also, pip3 install git+https://github.com/huggingface/diffusers.git (latest)
You can install the stable release of cache-dit from PyPI, or the latest development version from GitHub. Then try โฅ๏ธ Cache Acceleration with just one line of code ~ โฅ๏ธ
>>> import cache_dit
>>> from diffusers import DiffusionPipeline
>>> pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image") # Can be any diffusion pipeline
>>> cache_dit.enable_cache(pipe) # One-line code with default cache options.
>>> output = pipe(...) # Just call the pipe as normal.
>>> stats = cache_dit.summary(pipe) # Then, get the summary of cache acceleration stats.
>>> cache_dit.disable_cache(pipe) # Disable cache and run original pipe.
๐Core Features
- ๐Full ๐คDiffusers Support: Notably, cache-dit now supports nearly all of Diffusers' DiT-based pipelines, include 30+ series, nearly 100+ pipelines: ๐ฅFLUX, ๐ฅQwen-Image, ๐ฅZ-image, ๐ฅWan, etc.
- ๐Extremely Easy to Use: In most cases, you only need one line of code:
cache_dit.enable_cache(...). After calling this API, just use the pipeline as normal. - ๐State-of-the-Art Performance: Compared with other algorithms, cache-dit achieved the SOTA w/ 7.4xโ๐ speedup on ClipScore! Surprisingly, it's DBCache also works for extremely few-step distilled models.
- ๐Compatibility with Other Optimizations: Designed to work seamlessly with torch.compile, Quantization, CPU or Sequential Offloading, ๐ฅContext Parallelism, ๐ฅTensor Parallelism, etc.
- ๐Hybrid Cache Acceleration: Now supports hybrid Block-wise Cache + Calibrator schemes. DBCache acts as the Indicator to decide when to cache, while the Calibrator decides how to cache.
- ๐HTTP Serving Support: Built-in HTTP serving capabilities for production deployment with simple REST API. Supports text-to-image, image editing, text-to-video, and image-to-video generation.
- ๐คDiffusers Ecosystem Integration: ๐ฅcache-dit has joined the Diffusers community ecosystem as the first DiT-specific cache acceleration framework for ๐คdiffusers, ๐ฅSGLang Diffusion, and ๐ฅvLLM-Omni.
๐ฅSupported DiTs
[!Tip] One Model Series may contain many pipelines. cache-dit applies optimizations at the Transformer level; so, any pipelines that include the supported transformer are already supported by cache-dit. โ : known work and official supported now; โ๏ธ: unofficial supported now, but maybe support in the future;
Q: 4-bits models w/ nunchaku W4A4; TE: Text Encoder Parallelism; ๐กC*: Hybrid Cache Acceleration.
| ๐Model | C* | CP | TP | TE | ๐Model | C* | CP | TP | TE |
|---|---|---|---|---|---|---|---|---|---|
| ๐ฅZ-Image | โ | โ | โ | โ | ๐ฅZ-Image-Control | โ๏ธ | โ๏ธ | โ | โ |
| ๐ฅOvis-Image | โ | โ | โ | โ | ๐ฅHuyuanVideo 1.5 | โ | โ๏ธ | โ๏ธ | โ |
| ๐ฅFLUX.2 | โ | โ | โ | โ | ๐FLUX.1 Q |
โ | โ | โ๏ธ | โ |
| ๐FLUX.1 | โ | โ | โ | โ | ๐Qwen-Image Q |
โ | โ | โ๏ธ | โ |
| ๐Qwen-Image | โ | โ | โ | โ | ๐Qwen...Edit Q |
โ | โ | โ๏ธ | โ |
| ๐Qwen...Edit | โ | โ | โ | โ | ๐Qwen.E.Plus Q |
โ | โ | โ๏ธ | โ |
| ๐Qwen..Light | โ | โ | โ | โ | ๐Qwen...Light Q |
โ | โ | โ๏ธ | โ |
| ๐Wan 2.2 T2V/ITV | โ | โ | โ | โ | ๐Qwen.E.Light Q |
โ | โ | โ๏ธ | โ |
| ๐Wan 2.2 VACE | โ | โ | โ | โ | ๐Mochi | โ | โ๏ธ | โ | โ |
| ๐Wan 2.1 T2V/ITV | โ | โ | โ | โ | ๐HiDream | โ | โ๏ธ | โ๏ธ | โ |
| ๐Wan 2.1 VACE | โ | โ | โ | โ | ๐HunyuanDiT | โ | โ๏ธ | โ | โ |
| ๐HunyuanVideo | โ | โ | โ | โ | ๐Sana | โ | โ๏ธ | โ๏ธ | โ |
| ๐ChronoEdit | โ | โ | โ | โ | ๐Bria | โ | โ๏ธ | โ๏ธ | โ |
| ๐CogVideoX | โ | โ | โ | โ | ๐SkyReelsV2 | โ | โ | โ | โ |
| ๐CogVideoX 1.5 | โ | โ | โ | โ | ๐Lumina 1/2 | โ | โ๏ธ | โ | โ |
| ๐CogView4 | โ | โ | โ | โ | ๐DiT-XL | โ | โ | โ๏ธ | โ |
| ๐CogView3Plus | โ | โ | โ | โ | ๐Allegro | โ | โ๏ธ | โ๏ธ | โ |
| ๐PixArt Sigma | โ | โ | โ | โ | ๐Cosmos | โ | โ๏ธ | โ๏ธ | โ |
| ๐PixArt Alpha | โ | โ | โ | โ | ๐OmniGen | โ | โ๏ธ | โ๏ธ | โ |
| ๐Chroma-HD | โ | โ | ๏ธโ | โ | ๐EasyAnimate | โ | โ๏ธ | โ๏ธ | โ |
| ๐VisualCloze | โ | โ | โ | โ | ๐StableDiffusion3 | โ | โ๏ธ | โ๏ธ | โ |
| ๐HunyuanImage | โ | โ | โ | โ | ๐PRX T2I | โ | โ๏ธ | โ๏ธ | โ |
| ๐Kandinsky5 | โ | โ ๏ธ | โ ๏ธ | โ | ๐Amused | โ | โ๏ธ | โ๏ธ | โ |
| ๐LTXVideo | โ | โ | โ | โ | ๐AuraFlow | โ | โ๏ธ | โ๏ธ | โ |
| ๐ConsisID | โ | โ | โ | โ | ๐LongCatVideo | โ | โ๏ธ | โ๏ธ | โ |
๐ฅClick here to show many Image/Video cases๐ฅ
๐Now, cache-dit covers almost All Diffusers' DiT Pipelines๐
๐ฅQwen-Image | Qwen-Image-Edit | Qwen-Image-Edit-Plus ๐ฅ
๐ฅFLUX.1 | Qwen-Image-Lightning 4/8 Steps | Wan 2.1 | Wan 2.2 ๐ฅ
๐ฅHunyuanImage-2.1 | HunyuanVideo | HunyuanDiT | HiDream | AuraFlow๐ฅ
๐ฅCogView3Plus | CogView4 | LTXVideo | CogVideoX | CogVideoX 1.5 | ConsisID๐ฅ
๐ฅCosmos | SkyReelsV2 | VisualCloze | OmniGen 1/2 | Lumina 1/2 | PixArt๐ฅ
๐ฅChroma | Sana | Allegro | Mochi | SD 3/3.5 | Amused | ... | DiT-XL๐ฅ
๐ฅWan2.2 MoE | +cache-dit:2.0xโ๐ | HunyuanVideo | +cache-dit:2.1xโ๐
๐ฅQwen-Image | +cache-dit:1.8xโ๐ | FLUX.1-dev | +cache-dit:2.1xโ๐
๐ฅQwen...Lightning | +cache-dit:1.14xโ๐ | HunyuanImage | +cache-dit:1.7xโ๐
๐ฅQwen-Image-Edit | Input w/o Edit | Baseline | +cache-dit:1.6xโ๐ | 1.9xโ๐
๐ฅFLUX-Kontext-dev | Baseline | +cache-dit:1.3xโ๐ | 1.7xโ๐ | 2.0xโ ๐
๐ฅHiDream-I1 | +cache-dit:1.9xโ๐ | CogView4 | +cache-dit:1.4xโ๐ | 1.7xโ๐
๐ฅCogView3 | +cache-dit:1.5xโ๐ | 2.0xโ๐| Chroma1-HD | +cache-dit:1.9xโ๐
๐ฅMochi-1-preview | +cache-dit:1.8xโ๐ | SkyReelsV2 | +cache-dit:1.6xโ๐
๐ฅVisualCloze-512 | Model | Cloth | Baseline | +cache-dit:1.4xโ๐ | 1.7xโ๐
๐ฅLTX-Video-0.9.7 | +cache-dit:1.7xโ๐ | CogVideoX1.5 | +cache-dit:2.0xโ๐
๐ฅOmniGen-v1 | +cache-dit:1.5xโ๐ | 3.3xโ๐ | Lumina2 | +cache-dit:1.9xโ๐
๐ฅAllegro | +cache-dit:1.36xโ๐ | AuraFlow-v0.3 | +cache-dit:2.27xโ๐
๐ฅSana | +cache-dit:1.3xโ๐ | 1.6xโ๐| PixArt-Sigma | +cache-dit:2.3xโ๐
๐ฅPixArt-Alpha | +cache-dit:1.6xโ๐ | 1.8xโ๐| SD 3.5 | +cache-dit:2.5xโ๐
๐ฅAsumed | +cache-dit:1.1xโ๐ | 1.2xโ๐ | DiT-XL-256 | +cache-dit:1.8xโ๐
โฅ๏ธ Please consider to leave a โญ๏ธ Star to support us ~ โฅ๏ธ
๐Table of Contents
For more advanced features such as Unified Cache APIs, Forward Pattern Matching, Automatic Block Adapter, Hybrid Forward Pattern, Patch Functor, DBCache, DBPrune, TaylorSeer Calibrator, SCM, Hybrid Cache CFG, Context Parallelism (w/ UAA) and Tensor Parallelism, please refer to the ๐User_Guide.md for details.
๐Quick Links
- ๐Examples - The easiest way to enable hybrid cache acceleration and parallelism for DiTs with cache-dit is to start with our examples for popular models: FLUX, Z-Image, Qwen-Image, Wan, etc.
- ๐HTTP Serving - Deploy cache-dit models with HTTP API for text-to-image, image editing, multi-image editing, and text-to-video generation.
- โFAQ - Frequently asked questions including attention backend configuration, troubleshooting, and optimization tips.
๐Documentation
- โ๏ธInstallation
- ๐ฅSupported DiTs
- ๐ฅBenchmarks
- ๐Unified Cache APIs
- โก๏ธDBCache: Dual Block Cache
- โก๏ธDBPrune: Dynamic Block Prune
- โก๏ธHybrid Cache CFG
- ๐ฅHybrid TaylorSeer Calibrator
- ๐คSCM: Steps Computation Masking
- โก๏ธHybrid Context Parallelism
- ๐คUAA: Ulysses Anything Attention
- ๐คAsync Ulysses QKV Projection
- ๐คAsync FP8 Ulysses Attention
- โก๏ธHybrid Tensor Parallelism
- ๐คParallelize Text Encoder
- ๐คLow-bits Quantization
- ๐คHow to use FP8 Attention
- ๐ Metrics Command Line
- โ๏ธTorch Compile
- ๐Torch Profiler Usage
- ๐API Documents
๐Contribute
How to contribute? Star โญ๏ธ this repo to support us or check CONTRIBUTE.md.
๐Projects Using CacheDiT
Here is a curated list of open-source projects integrating CacheDiT, including popular repositories like jetson-containers, flux-fast, sdnext, ๐ฅvLLM-Omni, and ๐ฅSGLang Diffusion. ๐CacheDiT has been recommended by many famous opensource projects: ๐ฅZ-Image, ๐ฅWan 2.2, ๐ฅQwen-Image, ๐ฅLongCat-Video, Qwen-Image-Lightning, Kandinsky-5, LeMiCa, ๐คdiffusers, HelloGitHub and GaintPandaCV.
ยฉ๏ธAcknowledgements
Special thanks to vipshop's Computer Vision AI Team for supporting document, testing and production-level deployment of this project. We learned the design and reused code from the following projects: ๐คdiffusers, SGLang, ParaAttention, xDiT, TaylorSeer and LeMiCa.
ยฉ๏ธCitations
@misc{cache-dit@2025,
title={cache-dit: A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.},
url={https://github.com/vipshop/cache-dit.git},
note={Open-source software available at https://github.com/vipshop/cache-dit.git},
author={DefTruth, vipshop.com},
year={2025}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cache_dit-1.1.9-py3-none-any.whl.
File metadata
- Download URL: cache_dit-1.1.9-py3-none-any.whl
- Upload date:
- Size: 258.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
65aab5aaed9acbaaaefca3ce7d19d9796cccdd19f3943fff6ece5309ebe2636a
|
|
| MD5 |
9c9f6aecc439d26a5fa0af2dc7daf3e5
|
|
| BLAKE2b-256 |
e9a182765fde11ce63c89def01e3418245cc84eceb06ce6a2ceacfa10ae37a21
|