Skip to main content

Unified CUDA kernels for FastVideo

Project description

FastVideo Kernel

CUDA kernels for FastVideo video generation.

Installation

Standard Installation (Local Development)

This will automatically detect your GPU architecture. If an NVIDIA Hopper (H100/sm_90a) GPU is detected, ThunderKittens kernels will be enabled. Otherwise, they will be skipped, and the package will use Triton fallbacks at runtime.

git submodule update --init --recursive
cd fastvideo-kernel
./build.sh

Rocm Build

If you are in a rocm environment without the compilation toolchaine of CUDA.

cd fastvideo-kernel
./build.sh --rocm

Usage

Sliding Tile Attention (STA) & Video Sparse Attention (VSA)

For detailed usage, please check the Attention Documentation.

from fastvideo_kernel import sliding_tile_attention, video_sparse_attn, moba_attn_varlen

# Example: Sliding Tile Attention
out = sliding_tile_attention(q, k, v, window_sizes, text_len)

# Example: Video Sparse Attention (with Triton fallback)
out = video_sparse_attn(q, k, v, block_sizes, topk=5)

# Example: VMoBA
out = moba_attn_varlen(q, k, v, cu_seqlens_q, cu_seqlens_k, ...)

TurboDiffusion Kernels

This package also includes kernels from TurboDiffusion, including INT8 GEMM, Quantization, RMSNorm and LayerNorm.

Requirements

  • Runtime:
    • NVIDIA H100 (sm_90a) for C++ optimized kernels.
    • Any CUDA GPU for Triton-based fallbacks.
  • Build:
    • CUDA Toolkit 12.3+
    • C++20 compatible compiler (GCC 10+, Clang 11+)

Acknowledgement

This package structure and build system are based on sgl-kernel from the SGLang project.

The implementation of turbodiffusion kernels is adapted from TurboDiffusion. If you use these kernels, please cite:

@article{zhang2025turbodiffusion,
  title={TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times},
  author={Zhang, Jintao and Zheng, Kaiwen and Jiang, Kai and Wang, Haoxu and Stoica, Ion and Gonzalez, Joseph E and Chen, Jianfei and Zhu, Jun},
  journal={arXiv preprint arXiv:2512.16093},
  year={2025}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastvideo_kernel-0.2.2.tar.gz (2.8 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fastvideo_kernel-0.2.2-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl (944.4 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64manylinux: glibc 2.35+ x86-64

fastvideo_kernel-0.2.2-cp311-cp311-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl (944.0 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64manylinux: glibc 2.35+ x86-64

fastvideo_kernel-0.2.2-cp310-cp310-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl (942.1 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64manylinux: glibc 2.35+ x86-64

File details

Details for the file fastvideo_kernel-0.2.2.tar.gz.

File metadata

  • Download URL: fastvideo_kernel-0.2.2.tar.gz
  • Upload date:
  • Size: 2.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fastvideo_kernel-0.2.2.tar.gz
Algorithm Hash digest
SHA256 2c2e013ad10595218e12f90348d27275c1b0c13b6e2031b9d11f187439020e9a
MD5 b4091122a85ca237394787ce74cbbf95
BLAKE2b-256 ee1c6d790e100e0d89618d01651a43c6267fddca45e0793837355bf4e9c6ad91

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastvideo_kernel-0.2.2.tar.gz:

Publisher: fastvideo-kernel-publish.yml on hao-ai-lab/FastVideo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastvideo_kernel-0.2.2-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for fastvideo_kernel-0.2.2-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 00b6070a17bef8540bd6902d6c25b3bda820a8f23e55a05e75137cab5e64b3b5
MD5 35cdd65f18241c4751f021faf4d9b8ba
BLAKE2b-256 ebc5b0a280c5aeb382f8f89caba19426e0b5d35ff3adcd8e9f107c51f8fca4fa

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastvideo_kernel-0.2.2-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl:

Publisher: fastvideo-kernel-publish.yml on hao-ai-lab/FastVideo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastvideo_kernel-0.2.2-cp311-cp311-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for fastvideo_kernel-0.2.2-cp311-cp311-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 1c0bbcfa7b8c0dea280a3a122dec2ea97c3f1373f124063d3d89820fa39a9503
MD5 b468d13c9fadac84cd096382e64eff90
BLAKE2b-256 54dd26b96c872a0308707bcfe811e716b2641b720abc074b66a8f9558c243b2c

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastvideo_kernel-0.2.2-cp311-cp311-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl:

Publisher: fastvideo-kernel-publish.yml on hao-ai-lab/FastVideo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastvideo_kernel-0.2.2-cp310-cp310-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for fastvideo_kernel-0.2.2-cp310-cp310-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 b81e81993ef730084108ad584485e0b89cbbffb20b35e770dd169626d859232b
MD5 94735fa6880e3d5d97aa82acce56d38e
BLAKE2b-256 92fe1e07a55da6109a6d433cfdc8cf03e028a4c6bac640194c52ec5a023a6cba

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastvideo_kernel-0.2.2-cp310-cp310-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl:

Publisher: fastvideo-kernel-publish.yml on hao-ai-lab/FastVideo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page