Skip to main content

Unified CUDA kernels for FastVideo

Project description

FastVideo Kernel

CUDA kernels for FastVideo video generation.

Installation

Standard Installation (Local Development)

This will automatically detect your GPU architecture. If an NVIDIA Hopper (H100/sm_90a) GPU is detected, ThunderKittens kernels will be enabled. Otherwise, they will be skipped, and the package will use Triton fallbacks at runtime.

git submodule update --init --recursive
cd fastvideo-kernel
./build.sh

Rocm Build

If you are in a rocm environment without the compilation toolchaine of CUDA.

cd fastvideo-kernel
./build.sh --rocm

Usage

Sliding Tile Attention (STA) & Video Sparse Attention (VSA)

For detailed usage, please check the Attention Documentation.

from fastvideo_kernel import sliding_tile_attention, video_sparse_attn, moba_attn_varlen

# Example: Sliding Tile Attention
out = sliding_tile_attention(q, k, v, window_sizes, text_len)

# Example: Video Sparse Attention (with Triton fallback)
out = video_sparse_attn(q, k, v, block_sizes, block_sizes, topk=5)

# Example: VMoBA
out = moba_attn_varlen(q, k, v, cu_seqlens_q, cu_seqlens_k, ...)

Benchmark

VSA (block-sparse) TFLOPs

After building/installing fastvideo-kernel, run:

cd fastvideo-kernel
python benchmarks/bench_vsa.py --batch_size 1 --num_heads 16 --head_dim 128 --q_seq_lens 49152 --topk 64

TurboDiffusion Kernels

This package also includes kernels from TurboDiffusion, including INT8 GEMM, Quantization, RMSNorm and LayerNorm.

Requirements

  • Runtime:
    • NVIDIA H100 (sm_90a) for C++ optimized kernels.
    • Any CUDA GPU for Triton-based fallbacks.
  • Build:
    • CUDA Toolkit 12.3+
    • C++20 compatible compiler (GCC 10+, Clang 11+)

Acknowledgement

This package structure and build system are based on sgl-kernel from the SGLang project.

The implementation of turbodiffusion kernels is adapted from TurboDiffusion. If you use these kernels, please cite:

@article{zhang2025turbodiffusion,
  title={TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times},
  author={Zhang, Jintao and Zheng, Kaiwen and Jiang, Kai and Wang, Haoxu and Stoica, Ion and Gonzalez, Joseph E and Chen, Jianfei and Zhu, Jun},
  journal={arXiv preprint arXiv:2512.16093},
  year={2025}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastvideo_kernel-0.2.6.tar.gz (39.8 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fastvideo_kernel-0.2.6-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl (13.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64manylinux: glibc 2.35+ x86-64

fastvideo_kernel-0.2.6-cp311-cp311-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl (13.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64manylinux: glibc 2.35+ x86-64

fastvideo_kernel-0.2.6-cp310-cp310-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl (13.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64manylinux: glibc 2.35+ x86-64

File details

Details for the file fastvideo_kernel-0.2.6.tar.gz.

File metadata

  • Download URL: fastvideo_kernel-0.2.6.tar.gz
  • Upload date:
  • Size: 39.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fastvideo_kernel-0.2.6.tar.gz
Algorithm Hash digest
SHA256 7e04c4acfb6299a589ac548a34a27e5964dde8e2b75b6e7c378c414661326759
MD5 93e84cd08a2956afbd03085930feca4d
BLAKE2b-256 6eecf7142f7b43eff1164182d4da87f38beb3263979c24e8f6d2d93b6dcd7aef

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastvideo_kernel-0.2.6.tar.gz:

Publisher: fastvideo-kernel-publish.yml on hao-ai-lab/FastVideo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastvideo_kernel-0.2.6-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for fastvideo_kernel-0.2.6-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 39b02c45c020b88ddbe190412db291217c822dbabd890250e4d54c54d7e29157
MD5 338801daa482dc5c2fc1f7fd298ee983
BLAKE2b-256 39913ec2d3ad17b6649532e1a092ba86bd2affd31778327de4e49d13526b3fac

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastvideo_kernel-0.2.6-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl:

Publisher: fastvideo-kernel-publish.yml on hao-ai-lab/FastVideo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastvideo_kernel-0.2.6-cp311-cp311-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for fastvideo_kernel-0.2.6-cp311-cp311-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 23f3104227ebf81f7520fb9238a214f4703de9d88f208f28ac65200084304af0
MD5 5f4818639209417229e43d3997f76cf0
BLAKE2b-256 7f59e95edcb41e3855dd0f5a1edf84d55701ff73197904764db49b16ebaca673

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastvideo_kernel-0.2.6-cp311-cp311-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl:

Publisher: fastvideo-kernel-publish.yml on hao-ai-lab/FastVideo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastvideo_kernel-0.2.6-cp310-cp310-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for fastvideo_kernel-0.2.6-cp310-cp310-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 ce53c1ddbfd892a7c6322ede32589cd2d3231decaf46fab3c9a940d6009f82c0
MD5 4fdfce04dc51f4f7d1089e737f4e0a6f
BLAKE2b-256 a042d6c3874caf42b57334b9b6406dea215fbd945caef8c8da9a82788166fc94

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastvideo_kernel-0.2.6-cp310-cp310-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl:

Publisher: fastvideo-kernel-publish.yml on hao-ai-lab/FastVideo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page