cache-dit

A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for 🤗DiTs.

These details have not been verified by PyPI

Project links

Project description

A PyTorch-native and Flexible Inference Engine with
Hybrid Cache Acceleration and Parallelism for 🤗DiTs

Baseline	SCM S S*	SCM F D*	SCM U D*	+TS	+compile	+FP8*
24.85s	15.4s	11.4s	8.2s	8.2s	🎉7.1s	🎉4.5s

Scheme: DBCache + SCM(steps_computation_mask) + TS(TaylorSeer) + FP8*, L20x1, S*: static cache,
D*: dynamic cache, S: Slow, F: Fast, U: Ultra Fast, TS: TaylorSeer, FP8*: FP8 DQ + Sage, FLUX.1-Dev

U*: Ulysses Attention, UAA: Ulysses Anything Attenton, UAA*: UAA + Gloo, Device: NVIDIA L20
FLUX.1-Dev w/o CPU Offload, 28 steps; Qwen-Image w/ CPU Offload, 50 steps; Gloo: Extra All Gather w/ Gloo

CP2 U*	CP2 UAA*	L20x1	CP2 UAA*	CP2 U*	L20x1	CP2 UAA*
FLUX, 13.87s	🎉13.88s	23.25s	🎉13.75s	Qwen, 132s	181s	🎉133s

1024x1024	1024x1024	1008x1008	1008x1008	1312x1312	1328x1328	1328x1328
✔️U* ✔️UAA	✔️U* ✔️UAA	NO CP	❌U* ✔️UAA	✔️U* ✔️UAA	NO CP	❌U* ✔️UAA

🔥Hightlight

We are excited to announce that the 🎉v1.1.0 version of cache-dit has finally been released! It brings 🔥Context Parallelism and 🔥Tensor Parallelism to cache-dit, thus making it a PyTorch-native and Flexible Inference Engine for 🤗DiTs. Key features: Unified Cache APIs, Forward Pattern Matching, Block Adapter, DBCache, DBPrune, Cache CFG, TaylorSeer, SCM, Context Parallelism (w/ UAA), Tensor Parallelism and 🎉SOTA performance.

pip3 install -U cache-dit # Also, pip3 install git+https://github.com/huggingface/diffusers.git (latest)

You can install the stable release of cache-dit from PyPI, or the latest development version from GitHub. Then try ♥️ Cache Acceleration with just one line of code ~ ♥️

>>> import cache_dit
>>> from diffusers import DiffusionPipeline
>>> pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image") # Can be any diffusion pipeline
>>> cache_dit.enable_cache(pipe) # One-line code with default cache options.
>>> output = pipe(...) # Just call the pipe as normal.
>>> stats = cache_dit.summary(pipe) # Then, get the summary of cache acceleration stats.
>>> cache_dit.disable_cache(pipe) # Disable cache and run original pipe.

📚Core Features

🎉Full 🤗Diffusers Support: Notably, cache-dit now supports nearly all of Diffusers' DiTs, include 60+ models, ~100+ pipelines: 🔥FLUX, 🔥Qwen-Image, 🔥Z-image, 🔥LongCat-Image, 🔥Wan, etc.
🎉Extremely Easy to Use: In most cases, you only need one line of code: cache_dit.enable_cache(...). After calling this API, just use the pipeline as normal.
🎉State-of-the-Art Performance: Compared with other algorithms, cache-dit achieved the SOTA w/ 7.4x↑🎉 speedup on ClipScore! Surprisingly, it's DBCache also works for extremely few-step distilled models.
🎉Compatibility with Other Optimizations: Designed to work seamlessly with torch.compile, Quantization, CPU or Sequential Offloading, Context Parallelism, Tensor Parallelism, etc.
🎉Hybrid Cache Acceleration: Now supports hybrid Block-wise Cache + Calibrator schemes. DBCache acts as the Indicator to decide when to cache, while the Calibrator decides how to cache.
🎉Ecosystem Integration: Joined the Diffusers community as the first DiTs' cache acceleration framework for 🤗diffusers, 🔥SGLang Diffusion, 🔥vLLM-Omni, 🔥stable-diffusion.cpp, 🔥nunchaku and 🔥sdnext.
🎉HTTP Serving Support: Built-in HTTP serving capabilities for production deployment with simple REST API. Supports text-to-image, image editing, text/image-to-video, and LoRA.

🔥Supported DiTs

[!Tip]
One Model Series may contain many pipelines. cache-dit applies optimizations at the Transformer level; thus, any pipelines that include the supported transformer are already supported by cache-dit. ✅: supported now; ✖️: not supported now; 🤖Q: nunchaku w/ SVDQ W4A4; C-P: Context Parallelism; T-P: Tensor Parallelism; TE-P: Text Encoder Parallelism; CN-P: ControlNet Parallelism; VAE-P: VAE Parallelism (TODO).

📚Supported DiTs: `🤗65+`	Cache	C-P	T-P	TE-P	CN-P	VAE-P
Z-Image-Turbo `🤖Q`	✅	✅	✖️	✅	✖️	✖️
Qwen-Image-Layered	✅	✅	✅	✅	✖️	✖️
Qwen-Image-Edit-2511-Lightning	✅	✅	✅	✅	✖️	✖️
Qwen-Image-Edit-2511	✅	✅	✅	✅	✖️	✖️
LongCat-Image	✅	✅	✅	✅	✖️	✖️
LongCat-Image-Edit	✅	✅	✅	✅	✖️	✖️
Z-Image-Turbo	✅	✅	✅	✅	✖️	✖️
Z-Image-Turbo-Fun-ControlNet-2.0	✅	✅	✅	✅	✅	✖️
Z-Image-Turbo-Fun-ControlNet-2.1	✅	✅	✅	✅	✅	✖️
Ovis-Image	✅	✅	✅	✅	✖️	✖️
FLUX.2-dev	✅	✅	✅	✅	✖️	✖️
FLUX.1-dev	✅	✅	✅	✅	✖️	✖️
FLUX.1-Fill-dev	✅	✅	✅	✅	✖️	✖️
FLUX.1-Kontext-dev	✅	✅	✅	✅	✖️	✖️
Qwen-Image	✅	✅	✅	✅	✖️	✖️
Qwen-Image-Edit	✅	✅	✅	✅	✖️	✖️
Qwen-Image-Edit-2509	✅	✅	✅	✅	✖️	✖️
Qwen-Image-ControlNet	✅	✅	✅	✅	✖️	✖️
Qwen-Image-ControlNet-Inpainting	✅	✅	✅	✅	✖️	✖️
Qwen-Image-Lightning	✅	✅	✅	✅	✖️	✖️
Qwen-Image-Edit-Lightning	✅	✅	✅	✅	✖️	✖️
Qwen-Image-Edit-2509-Lightning	✅	✅	✅	✅	✖️	✖️
Wan-2.2-T2V	✅	✅	✅	✅	✖️	✖️
Wan-2.2-ITV	✅	✅	✅	✅	✖️	✖️
Wan-2.2-VACE-Fun	✅	✅	✅	✅	✖️	✖️
Wan-2.1-T2V	✅	✅	✅	✅	✖️	✖️
Wan-2.1-ITV	✅	✅	✅	✅	✖️	✖️
Wan-2.1-FLF2V	✅	✅	✅	✅	✖️	✖️
Wan-2.1-VACE	✅	✅	✅	✅	✖️	✖️
HunyuanImage-2.1	✅	✅	✅	✅	✖️	✖️
HunyuanVideo-1.5	✅	✖️	✖️	✅	✖️	✖️
HunyuanVideo	✅	✅	✅	✅	✖️	✖️
FLUX.1-dev `🤖Q`	✅	✅	✖️	✅	✖️	✖️
FLUX.1-Fill-dev `🤖Q`	✅	✅	✖️	✅	✖️	✖️
FLUX.1-Kontext-dev `🤖Q`	✅	✅	✖️	✅	✖️	✖️
Qwen-Image `🤖Q`	✅	✅	✖️	✅	✖️	✖️
Qwen-Image-Edit `🤖Q`	✅	✅	✖️	✅	✖️	✖️
Qwen-Image-Edit-2509 `🤖Q`	✅	✅	✖️	✅	✖️	✖️
Qwen-Image-Lightning `🤖Q`	✅	✅	✖️	✅	✖️	✖️
Qwen-Image-Edit-Lightning `🤖Q`	✅	✅	✖️	✅	✖️	✖️
Qwen-Image-Edit-2509-Lightning `🤖Q`	✅	✅	✖️	✅	✖️	✖️
SkyReels-V2-T2V	✅	✅	✅	✅	✖️	✖️
LongCat-Video	✅	✖️	✖️	✅	✖️	✖️
ChronoEdit-14B	✅	✅	✅	✅	✖️	✖️
Kandinsky-5.0-T2V-Lite	✅	✅️	✅️	✅	✖️	✖️
PRX-512-t2i-sft	✅	✖️	✖️	✅	✖️	✖️
LTX-Video-v0.9.8	✅	✅	✅	✅	✖️	✖️
LTX-Video-v0.9.7	✅	✅	✅	✅	✖️	✖️
CogVideoX	✅	✅	✅	✅	✖️	✖️
CogVideoX-1.5	✅	✅	✅	✅	✖️	✖️
CogView-4	✅	✅	✅	✅	✖️	✖️
CogView-3-Plus	✅	✅	✅	✅	✖️	✖️
Chroma1-HD	✅	✅	✅	✅	✖️	✖️
PixArt-Sigma-XL-2-1024-MS	✅	✅	✅	✅	✖️	✖️
PixArt-XL-2-1024-MS	✅	✅	✅	✅	✖️	✖️
VisualCloze-512	✅	✅	✅	✅	✖️	✖️
ConsisID-preview	✅	✅	✅	✅	✖️	✖️
mochi-1-preview	✅	✖️	✅	✅	✖️	✖️
Lumina-Image-2.0	✅	✖️	✅	✅	✖️	✖️
HiDream-I1-Full	✅	✖️	✖️	✅	✖️	✖️
HunyuanDiT	✅	✖️	✅	✅	✖️	✖️
Sana-1600M-1024px	✅	✖️	✖️	✅	✖️	✖️
DiT-XL-2-256	✅	✅	✖️	✅	✖️	✖️
Allegro-T2V	✅	✖️	✖️	✅	✖️	✖️
OmniGen-2	✅	✖️	✖️	✅	✖️	✖️
stable-diffusion-3.5-large	✅	✖️	✖️	✅	✖️	✖️
Amused-512	✅	✖️	✖️	✅	✖️	✖️
AuraFlow	✅	✖️	✖️	✅	✖️	✖️

🔥Click here to show many Image/Video cases🔥

🔥Wan2.2 MoE | +cache-dit:2.0x↑🎉 | HunyuanVideo | +cache-dit:2.1x↑🎉

🔥Qwen-Image | +cache-dit:1.8x↑🎉 | FLUX.1-dev | +cache-dit:2.1x↑🎉

🔥Qwen...Lightning | +cache-dit:1.14x↑🎉 | HunyuanImage | +cache-dit:1.7x↑🎉

🔥Qwen-Image-Edit | Input w/o Edit | Baseline | +cache-dit:1.6x↑🎉 | 1.9x↑🎉

🔥FLUX-Kontext-dev | Baseline | +cache-dit:1.3x↑🎉 | 1.7x↑🎉 | 2.0x↑ 🎉

🔥HiDream-I1 | +cache-dit:1.9x↑🎉 | CogView4 | +cache-dit:1.4x↑🎉 | 1.7x↑🎉

🔥CogView3 | +cache-dit:1.5x↑🎉 | 2.0x↑🎉| Chroma1-HD | +cache-dit:1.9x↑🎉

🔥Mochi-1-preview | +cache-dit:1.8x↑🎉 | SkyReelsV2 | +cache-dit:1.6x↑🎉

🔥LTX-Video-0.9.7 | +cache-dit:1.7x↑🎉 | CogVideoX1.5 | +cache-dit:2.0x↑🎉

🔥OmniGen-v1 | +cache-dit:1.5x↑🎉 | 3.3x↑🎉 | Lumina2 | +cache-dit:1.9x↑🎉

🔥Allegro | +cache-dit:1.36x↑🎉 | AuraFlow-v0.3 | +cache-dit:2.27x↑🎉

🔥Sana | +cache-dit:1.3x↑🎉 | 1.6x↑🎉| PixArt-Sigma | +cache-dit:2.3x↑🎉

🔥PixArt-Alpha | +cache-dit:1.6x↑🎉 | 1.8x↑🎉| SD 3.5 | +cache-dit:2.5x↑🎉

🔥Asumed | +cache-dit:1.1x↑🎉 | 1.2x↑🎉 | DiT-XL-256 | +cache-dit:1.8x↑🎉
♥️ Please consider to leave a ⭐️ Star to support us ~ ♥️

📖Table of Contents

🚀Quick Links

📊Examples - The easiest way to enable hybrid cache acceleration and parallelism for DiTs with cache-dit is to start with our examples for popular models: FLUX, Z-Image, Qwen-Image, Wan, etc.
🌐HTTP Serving - Deploy cache-dit models with HTTP API for text-to-image, image editing, multi-image editing, and text/image-to-video generation.
🎉User Guide - For more advanced features, please refer to the 🎉User_Guide.md for details.
❓FAQ - Frequently asked questions including attention backend configuration, troubleshooting, and optimization tips.

📚Documentation

👋Contribute

How to contribute? Star ⭐️ this repo to support us or check CONTRIBUTE.md.

🎉Projects Using CacheDiT

Here is a curated list of open-source projects integrating CacheDiT, including popular repositories like jetson-containers, flux-fast, 🔥sdnext, 🔥stable-diffusion.cpp, 🔥nunchaku, 🔥vLLM-Omni, and 🔥SGLang Diffusion. 🎉CacheDiT has been recommended by many famous opensource projects: 🔥Z-Image, 🔥Wan 2.2, 🔥Qwen-Image, 🔥LongCat-Video, Qwen-Image-Lightning, Kandinsky-5, LeMiCa, 🤗diffusers, HelloGitHub and GiantPandaLLM.

©️Acknowledgements

Special thanks to vipshop's Computer Vision AI Team for supporting document, testing and production-level deployment of this project. We learned the design and reused code from the following projects: 🤗diffusers, SGLang, ParaAttention, xDiT, TaylorSeer and LeMiCa.

©️Citations

@misc{cache-dit@2025,
  title={cache-dit: A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.},
  url={https://github.com/vipshop/cache-dit.git},
  note={Open-source software available at https://github.com/vipshop/cache-dit.git},
  author={DefTruth, vipshop.com},
  year={2025}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.3.6

May 11, 2026

1.3.5

Mar 30, 2026

1.3.4

Mar 27, 2026

1.3.3

Mar 26, 2026

1.3.2

Mar 26, 2026

1.3.1

Mar 25, 2026

1.3.0

Mar 11, 2026

1.2.3

Feb 26, 2026

1.2.2

Feb 10, 2026

1.2.1

Feb 2, 2026

1.2.0

Jan 16, 2026

This version

1.1.10

Dec 31, 2025

1.1.9

Dec 22, 2025

1.1.8

Dec 10, 2025

1.1.7

Dec 6, 2025

1.1.6

Dec 5, 2025

1.1.5

Dec 5, 2025

1.1.4

Nov 28, 2025

1.1.3

Nov 28, 2025

1.1.2

Nov 24, 2025

1.1.1

Nov 19, 2025

1.1.0

Nov 18, 2025

1.0.16

Nov 17, 2025

1.0.15

Nov 13, 2025

1.0.14

Nov 11, 2025

1.0.13

Nov 7, 2025

1.0.12

Nov 7, 2025

1.0.11

Nov 5, 2025

1.0.10

Oct 30, 2025

1.0.9

Oct 24, 2025

1.0.8

Oct 22, 2025

1.0.7

Oct 22, 2025

1.0.6

Oct 20, 2025

1.0.5

Oct 15, 2025

1.0.4

Oct 14, 2025

1.0.3

Oct 12, 2025

1.0.2

Oct 10, 2025

1.0.1

Sep 26, 2025

1.0.0

Sep 25, 2025

0.3.3

Sep 23, 2025

0.3.2

Sep 22, 2025

0.3.1

Sep 19, 2025

0.3.0

Sep 17, 2025

0.2.37

Sep 17, 2025

0.2.36

Sep 16, 2025

0.2.34

Sep 12, 2025

0.2.33

Sep 10, 2025

0.2.32

Sep 8, 2025

0.2.31

Sep 8, 2025

0.2.30

Sep 5, 2025

0.2.29

Sep 4, 2025

0.2.28

Sep 3, 2025

0.2.27

Sep 1, 2025

0.2.26

Aug 29, 2025

0.2.25

Aug 28, 2025

0.2.24

Aug 26, 2025

0.2.23

Aug 25, 2025

0.2.22

Aug 25, 2025

0.2.21

Aug 22, 2025

0.2.20

Aug 21, 2025

0.2.19

Aug 20, 2025

0.2.18

Aug 20, 2025

0.2.17

Aug 19, 2025

0.2.16

Aug 15, 2025

0.2.15

Aug 11, 2025

0.2.14

Aug 5, 2025

0.2.13

Jul 30, 2025

0.2.12

Jul 24, 2025

0.2.11

Jul 21, 2025

0.2.10

Jul 17, 2025

0.2.9

Jul 13, 2025

0.2.8

Jul 11, 2025

0.2.7

Jul 10, 2025

0.2.6

Jul 9, 2025

0.2.5

Jul 9, 2025

0.2.4

Jul 3, 2025

0.2.3

Jul 1, 2025

0.2.2

Jun 30, 2025

0.2.1

Jun 22, 2025

0.2.0

Jun 20, 2025

0.1.8

Jun 20, 2025

0.1.7

Jun 18, 2025

0.1.6

Jun 18, 2025

0.1.5

Jun 18, 2025

0.1.3

Jun 17, 2025

0.1.2

Jun 17, 2025

0.1.1

Jun 17, 2025

0.1.1.dev2 pre-release

Jun 16, 2025

0.1.0

Jun 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cache_dit-1.1.10-py3-none-any.whl (267.9 kB view details)

Uploaded Dec 31, 2025 Python 3

File details

Details for the file cache_dit-1.1.10-py3-none-any.whl.

File metadata

Download URL: cache_dit-1.1.10-py3-none-any.whl
Upload date: Dec 31, 2025
Size: 267.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for cache_dit-1.1.10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5f697410cc81f519bd7833f1f4f1f3d9d94fad768f049790b43d0e701e5d3d04`
MD5	`61f1f1d1c8f4db3b17bb50c8eca2558c`
BLAKE2b-256	`c9d2d61f814de0871ccb460443f73fd8e0c4d9f3aa033f5c2be4e38cb97e959b`

See more details on using hashes here.

cache-dit 1.1.10

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

A PyTorch-native and Flexible Inference Engine with
Hybrid Cache Acceleration and Parallelism for 🤗DiTs

🔥Hightlight

📚Core Features

🔥Supported DiTs

📖Table of Contents

🚀Quick Links

📚Documentation

👋Contribute

🎉Projects Using CacheDiT

©️Acknowledgements

©️Citations

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes