Skip to main content

A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for 🤗DiTs.

Project description

A PyTorch-native and Flexible Inference Engine with
Hybrid Cache Acceleration and Parallelism for 🤗DiTs
Featured|HelloGitHub

Baseline SCM S S* SCM F D* SCM U D* +TS +compile +FP8*
24.85s 15.4s 11.4s 8.2s 8.2s 🎉7.1s 🎉4.5s

🤗Why Cache-DiT❓❓Cache-DiT is built on top of the Diffusers library and now supports nearly 🔥ALL DiTs from Diffusers, including over 🤗70+ DiTs. Please refer to our online documentation at readthedocs.io for more details. The optimizations made by Cache-DiT include: (UAA: Ulysses Anything Attention)

  • 🎉Hybrid Cache Acceleration (DBCache, DBPrune, TaylorSeer, SCM and more)
  • 🎉Context Parallelism (w/ Extended Diffusers' CP APIs, UAA, Async Ulysses, FP8 comm)
  • 🎉Tensor Parallelism (w/ PyTorch native DTensor and Tensor Parallelism APIs)
  • 🎉Text Encoder Parallelism (w/ PyTorch native DTensor and Tensor Parallelism APIs)
  • 🎉Auto Encoder (VAE) Parallelism (w/ Data or Tile Parallelism, avoid OOM)
  • 🎉ControlNet Parallelism (w/ Context Parallelism for ControlNet module)
  • 🎉Built-in HTTP serving deployment support with simple REST APIs
  • 🎉Natively compatible with Compile, Offloading, Quantization, ...
  • 🎉Integration into vLLM-Omni, SGLang Diffusion, SD.Next, ...
  • 🎉Natively supports NVIDIA GPUs, Ascend NPUs (>= 1.2.0), ...

🚀Quick Start

You can install the cache-dit from PyPI or from source:

pip3 install -U cache-dit # or, pip3 install git+https://github.com/vipshop/cache-dit.git

Then try ♥️ Cache Acceleration with just one line of code ~ ♥️

>>> import cache_dit
>>> from diffusers import DiffusionPipeline
>>> # The pipe can be any diffusion pipeline.
>>> pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image")
>>> # Cache Acceleration with One-line code.
>>> cache_dit.enable_cache(pipe)
>>> # Or, Hybrid Cache Acceleration + Parallelism.
>>> from cache_dit import DBCacheConfig, ParallelismConfig
>>> cache_dit.enable_cache(
...   pipe, cache_config=DBCacheConfig(), 
...   parallelism_config=ParallelismConfig(ulysses_size=2)
... )
>>> from cache_dit import load_configs
>>> # Or, Load Acceleration config from a custom yaml file.
>>> cache_dit.enable_cache(pipe, **load_configs("config.yaml"))
>>> output = pipe(...) # Just call the pipe as normal.

Please refer to our online documentation at readthedocs.io for more details.

🚀Quick Links

  • 📊Examples - The easiest way to enable hybrid cache acceleration and parallelism for DiTs with cache-dit is to start with our examples for popular models: FLUX, Z-Image, Qwen-Image, Wan, etc.
  • 🌐HTTP Serving - Deploy cache-dit models with HTTP API for text-to-image, image editing, multi-image editing, and text/image-to-video generation.
  • 🎉User Guide - For more advanced features, please refer to the 🎉User Guide for details.
  • ❓FAQ - Frequently asked questions including attention backend configuration, troubleshooting, and optimization tips.

🌐Community Integration

©️Acknowledgements

Special thanks to vipshop's Computer Vision AI Team for supporting document, testing and deployment of this project. We learned the design and reused code from the following projects: Diffusers, SGLang, vLLM-Omni, ParaAttention, xDiT, TaylorSeer and LeMiCa.

©️Citations

@misc{cache-dit@2025,
  title={cache-dit: A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.},
  url={https://github.com/vipshop/cache-dit.git},
  note={Open-source software available at https://github.com/vipshop/cache-dit.git},
  author={DefTruth, vipshop.com},
  year={2025}
}

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cache_dit-1.2.0-py3-none-any.whl (303.6 kB view details)

Uploaded Python 3

File details

Details for the file cache_dit-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: cache_dit-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 303.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for cache_dit-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d2a31ebaaaee229e6197474e57f29ebf2ce442aeabb31385f9b6e27c7ca2bb84
MD5 fc08d4b5fa5cccce1b842c5287f31336
BLAKE2b-256 85e68bc7191784c8475cc21507106adeb8e0a5d70ab6d0f7141285380ee15ee0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page