Skip to main content

Unified CUDA kernels for FastVideo

Project description

FastVideo Kernel

CUDA kernels for FastVideo video generation.

Installation

Standard Installation (Local Development)

This will automatically detect your GPU architecture. If an NVIDIA Hopper (H100/sm_90a) GPU is detected, ThunderKittens kernels will be enabled. Otherwise, they will be skipped, and the package will use Triton fallbacks at runtime.

git submodule update --init --recursive
cd fastvideo-kernel
./build.sh

Rocm Build

If you are in a rocm environment without the compilation toolchaine of CUDA.

cd fastvideo-kernel
./build.sh --rocm

Usage

Sliding Tile Attention (STA) & Video Sparse Attention (VSA)

For detailed usage, please check the Attention Documentation.

from fastvideo_kernel import sliding_tile_attention, video_sparse_attn, moba_attn_varlen

# Example: Sliding Tile Attention
out = sliding_tile_attention(q, k, v, window_sizes, text_len)

# Example: Video Sparse Attention (with Triton fallback)
out = video_sparse_attn(q, k, v, block_sizes, block_sizes, topk=5)

# Example: VMoBA
out = moba_attn_varlen(q, k, v, cu_seqlens_q, cu_seqlens_k, ...)

Benchmark

VSA (block-sparse) TFLOPs

After building/installing fastvideo-kernel, run:

cd fastvideo-kernel
python benchmarks/bench_vsa.py --batch_size 1 --num_heads 16 --head_dim 128 --q_seq_lens 49152 --topk 64

TurboDiffusion Kernels

This package also includes kernels from TurboDiffusion, including INT8 GEMM, Quantization, RMSNorm and LayerNorm.

Requirements

  • Runtime:
    • NVIDIA H100 (sm_90a) for C++ optimized kernels.
    • Any CUDA GPU for Triton-based fallbacks.
  • Build:
    • CUDA Toolkit 12.3+
    • C++20 compatible compiler (GCC 10+, Clang 11+)

Acknowledgement

This package structure and build system are based on sgl-kernel from the SGLang project.

The implementation of turbodiffusion kernels is adapted from TurboDiffusion. If you use these kernels, please cite:

@article{zhang2025turbodiffusion,
  title={TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times},
  author={Zhang, Jintao and Zheng, Kaiwen and Jiang, Kai and Wang, Haoxu and Stoica, Ion and Gonzalez, Joseph E and Chen, Jianfei and Zhu, Jun},
  journal={arXiv preprint arXiv:2512.16093},
  year={2025}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastvideo_kernel-0.2.5.tar.gz (39.8 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fastvideo_kernel-0.2.5-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl (13.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64manylinux: glibc 2.35+ x86-64

fastvideo_kernel-0.2.5-cp311-cp311-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl (13.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64manylinux: glibc 2.35+ x86-64

fastvideo_kernel-0.2.5-cp310-cp310-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl (13.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64manylinux: glibc 2.35+ x86-64

File details

Details for the file fastvideo_kernel-0.2.5.tar.gz.

File metadata

  • Download URL: fastvideo_kernel-0.2.5.tar.gz
  • Upload date:
  • Size: 39.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fastvideo_kernel-0.2.5.tar.gz
Algorithm Hash digest
SHA256 e9a070e8b6f160bd7046354542a97c611049ef447b7d7f65a2c9b84446ef7ae9
MD5 8fd795b83f266c20bd6337049fbead4e
BLAKE2b-256 b92e4efbccad3432ea4c583e2e35a181e017ac8ae800e4f5bacd2a23e88f4cf7

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastvideo_kernel-0.2.5.tar.gz:

Publisher: fastvideo-kernel-publish.yml on hao-ai-lab/FastVideo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastvideo_kernel-0.2.5-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for fastvideo_kernel-0.2.5-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 d0b10b21bbd6cabb9dde11fa4f153e5e71cf2ac61b791379660364bf2304e89d
MD5 e24ec4157e970f8291a87590e28d69ce
BLAKE2b-256 08475d55c832b651da58a18c90e63468c0b5572ecfa20eb6aa4470dbe4e7fa1d

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastvideo_kernel-0.2.5-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl:

Publisher: fastvideo-kernel-publish.yml on hao-ai-lab/FastVideo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastvideo_kernel-0.2.5-cp311-cp311-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for fastvideo_kernel-0.2.5-cp311-cp311-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 ac4d208cb216656283d4af97fcda3c9366ad3abaeabc0424664749583507bd5e
MD5 875256fdf79c60e48942bec63f8561c3
BLAKE2b-256 c1d4c2628a0f7058534e41674b0be68dc55f2115a21ea967aa04544cfb129811

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastvideo_kernel-0.2.5-cp311-cp311-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl:

Publisher: fastvideo-kernel-publish.yml on hao-ai-lab/FastVideo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastvideo_kernel-0.2.5-cp310-cp310-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for fastvideo_kernel-0.2.5-cp310-cp310-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 df8339054f0d0ffa43fc3082d4ce027b7a20547f32aa0794b625cd82e41e42f1
MD5 9a7671e723ad213bb0c6bc2004693281
BLAKE2b-256 b96a8fe85b463da1a1f82ef2f498ce2a84e87411b9507fe61ea590160f55e8b1

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastvideo_kernel-0.2.5-cp310-cp310-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl:

Publisher: fastvideo-kernel-publish.yml on hao-ai-lab/FastVideo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page