Skip to main content

Unified CUDA kernels for FastVideo

Project description

FastVideo Kernel

CUDA kernels for FastVideo video generation.

Installation

Standard Installation (Local Development)

This will automatically detect your GPU architecture. If an NVIDIA Hopper (H100/sm_90a) GPU is detected, ThunderKittens kernels will be enabled. Otherwise, they will be skipped, and the package will use Triton fallbacks at runtime.

git submodule update --init --recursive
cd fastvideo-kernel
./build.sh

Rocm Build

If you are in a rocm environment without the compilation toolchaine of CUDA.

cd fastvideo-kernel
./build.sh --rocm

Usage

Sliding Tile Attention (STA) & Video Sparse Attention (VSA)

For detailed usage, please check the Attention Documentation.

from fastvideo_kernel import sliding_tile_attention, video_sparse_attn, moba_attn_varlen

# Example: Sliding Tile Attention
out = sliding_tile_attention(q, k, v, window_sizes, text_len)

# Example: Video Sparse Attention (with Triton fallback)
out = video_sparse_attn(q, k, v, block_sizes, block_sizes, topk=5)

# Example: VMoBA
out = moba_attn_varlen(q, k, v, cu_seqlens_q, cu_seqlens_k, ...)

TurboDiffusion Kernels

This package also includes kernels from TurboDiffusion, including INT8 GEMM, Quantization, RMSNorm and LayerNorm.

Requirements

  • Runtime:
    • NVIDIA H100 (sm_90a) for C++ optimized kernels.
    • Any CUDA GPU for Triton-based fallbacks.
  • Build:
    • CUDA Toolkit 12.3+
    • C++20 compatible compiler (GCC 10+, Clang 11+)

Acknowledgement

This package structure and build system are based on sgl-kernel from the SGLang project.

The implementation of turbodiffusion kernels is adapted from TurboDiffusion. If you use these kernels, please cite:

@article{zhang2025turbodiffusion,
  title={TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times},
  author={Zhang, Jintao and Zheng, Kaiwen and Jiang, Kai and Wang, Haoxu and Stoica, Ion and Gonzalez, Joseph E and Chen, Jianfei and Zhu, Jun},
  journal={arXiv preprint arXiv:2512.16093},
  year={2025}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastvideo_kernel-0.2.4.tar.gz (39.8 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fastvideo_kernel-0.2.4-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl (13.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64manylinux: glibc 2.35+ x86-64

fastvideo_kernel-0.2.4-cp311-cp311-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl (13.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64manylinux: glibc 2.35+ x86-64

fastvideo_kernel-0.2.4-cp310-cp310-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl (13.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64manylinux: glibc 2.35+ x86-64

File details

Details for the file fastvideo_kernel-0.2.4.tar.gz.

File metadata

  • Download URL: fastvideo_kernel-0.2.4.tar.gz
  • Upload date:
  • Size: 39.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fastvideo_kernel-0.2.4.tar.gz
Algorithm Hash digest
SHA256 353c5eaddc443a7c85cadc09d58aadee7d5656b332fd327cf09fa8af04cb96ea
MD5 e232a285f3502c9dde423717ae90a882
BLAKE2b-256 33b2550e102306c180da80c484f923a74070de087c90631849b40e2f51a7857b

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastvideo_kernel-0.2.4.tar.gz:

Publisher: fastvideo-kernel-publish.yml on hao-ai-lab/FastVideo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastvideo_kernel-0.2.4-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for fastvideo_kernel-0.2.4-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 54397215ef7d0054230988157aa1e546006bf63824b297bf628d88c31dc48814
MD5 46d0ce00dcab10bd759f80d1d492e8a9
BLAKE2b-256 1f3f9ea76c2c9e70cb7334f686ba4d232c23ff32dffe0436716da3493ff29bfd

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastvideo_kernel-0.2.4-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl:

Publisher: fastvideo-kernel-publish.yml on hao-ai-lab/FastVideo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastvideo_kernel-0.2.4-cp311-cp311-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for fastvideo_kernel-0.2.4-cp311-cp311-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 2b8c4adfc216de340b4d504d10f81a6f9de0e844ec9177ef94348ef93291024f
MD5 76cb65fc2904a18c16d2abd0ae6b2d03
BLAKE2b-256 6512916cf60c8f31b66d9f65de8bc2f978612c712f9d96f6ac8dcaccc075e00c

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastvideo_kernel-0.2.4-cp311-cp311-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl:

Publisher: fastvideo-kernel-publish.yml on hao-ai-lab/FastVideo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastvideo_kernel-0.2.4-cp310-cp310-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for fastvideo_kernel-0.2.4-cp310-cp310-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 187b452020c6d52a1d16290edafa4fa29f1cc2ace846bcbd7df96440f604e41d
MD5 084ab1db5a8420193768abcc099431e3
BLAKE2b-256 6030425188524dc94984dd780401435702f38cb4caae0b01d4fedd330e07abee

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastvideo_kernel-0.2.4-cp310-cp310-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl:

Publisher: fastvideo-kernel-publish.yml on hao-ai-lab/FastVideo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page