Unified CUDA kernels for FastVideo
Project description
FastVideo Kernel
CUDA kernels for FastVideo video generation.
Installation
Standard Installation (Local Development)
This will automatically detect your GPU architecture. If an NVIDIA Hopper (H100/sm_90a) GPU is detected, ThunderKittens kernels will be enabled. Otherwise, they will be skipped, and the package will use Triton fallbacks at runtime.
git submodule update --init --recursive
cd fastvideo-kernel
./build.sh
Rocm Build
If you are in a rocm environment without the compilation toolchaine of CUDA.
cd fastvideo-kernel
./build.sh --rocm
Usage
Sliding Tile Attention (STA) & Video Sparse Attention (VSA)
For detailed usage, please check the Attention Documentation.
from fastvideo_kernel import sliding_tile_attention, video_sparse_attn, moba_attn_varlen
# Example: Sliding Tile Attention
out = sliding_tile_attention(q, k, v, window_sizes, text_len)
# Example: Video Sparse Attention (with Triton fallback)
out = video_sparse_attn(q, k, v, block_sizes, block_sizes, topk=5)
# Example: VMoBA
out = moba_attn_varlen(q, k, v, cu_seqlens_q, cu_seqlens_k, ...)
Benchmark
VSA (block-sparse) TFLOPs
After building/installing fastvideo-kernel, run:
cd fastvideo-kernel
python benchmarks/bench_vsa.py --batch_size 1 --num_heads 16 --head_dim 128 --q_seq_lens 49152 --topk 64
TurboDiffusion Kernels
This package also includes kernels from TurboDiffusion, including INT8 GEMM, Quantization, RMSNorm and LayerNorm.
Requirements
- Runtime:
- NVIDIA H100 (sm_90a) for C++ optimized kernels.
- Any CUDA GPU for Triton-based fallbacks.
- Build:
- CUDA Toolkit 12.3+
- C++20 compatible compiler (GCC 10+, Clang 11+)
Acknowledgement
This package structure and build system are based on sgl-kernel from the SGLang project.
The implementation of turbodiffusion kernels is adapted from TurboDiffusion. If you use these kernels, please cite:
@article{zhang2025turbodiffusion,
title={TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times},
author={Zhang, Jintao and Zheng, Kaiwen and Jiang, Kai and Wang, Haoxu and Stoica, Ion and Gonzalez, Joseph E and Chen, Jianfei and Zhu, Jun},
journal={arXiv preprint arXiv:2512.16093},
year={2025}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fastvideo_kernel-0.2.5.tar.gz.
File metadata
- Download URL: fastvideo_kernel-0.2.5.tar.gz
- Upload date:
- Size: 39.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e9a070e8b6f160bd7046354542a97c611049ef447b7d7f65a2c9b84446ef7ae9
|
|
| MD5 |
8fd795b83f266c20bd6337049fbead4e
|
|
| BLAKE2b-256 |
b92e4efbccad3432ea4c583e2e35a181e017ac8ae800e4f5bacd2a23e88f4cf7
|
Provenance
The following attestation bundles were made for fastvideo_kernel-0.2.5.tar.gz:
Publisher:
fastvideo-kernel-publish.yml on hao-ai-lab/FastVideo
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fastvideo_kernel-0.2.5.tar.gz -
Subject digest:
e9a070e8b6f160bd7046354542a97c611049ef447b7d7f65a2c9b84446ef7ae9 - Sigstore transparency entry: 845077528
- Sigstore integration time:
-
Permalink:
hao-ai-lab/FastVideo@dbf3917bf4fa264d8aaa1d2dd8b2a7c1da40cde1 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/hao-ai-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
fastvideo-kernel-publish.yml@dbf3917bf4fa264d8aaa1d2dd8b2a7c1da40cde1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file fastvideo_kernel-0.2.5-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl.
File metadata
- Download URL: fastvideo_kernel-0.2.5-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl
- Upload date:
- Size: 13.9 MB
- Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64, manylinux: glibc 2.35+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0b10b21bbd6cabb9dde11fa4f153e5e71cf2ac61b791379660364bf2304e89d
|
|
| MD5 |
e24ec4157e970f8291a87590e28d69ce
|
|
| BLAKE2b-256 |
08475d55c832b651da58a18c90e63468c0b5572ecfa20eb6aa4470dbe4e7fa1d
|
Provenance
The following attestation bundles were made for fastvideo_kernel-0.2.5-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl:
Publisher:
fastvideo-kernel-publish.yml on hao-ai-lab/FastVideo
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fastvideo_kernel-0.2.5-cp312-cp312-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl -
Subject digest:
d0b10b21bbd6cabb9dde11fa4f153e5e71cf2ac61b791379660364bf2304e89d - Sigstore transparency entry: 845077560
- Sigstore integration time:
-
Permalink:
hao-ai-lab/FastVideo@dbf3917bf4fa264d8aaa1d2dd8b2a7c1da40cde1 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/hao-ai-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
fastvideo-kernel-publish.yml@dbf3917bf4fa264d8aaa1d2dd8b2a7c1da40cde1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file fastvideo_kernel-0.2.5-cp311-cp311-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl.
File metadata
- Download URL: fastvideo_kernel-0.2.5-cp311-cp311-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl
- Upload date:
- Size: 13.8 MB
- Tags: CPython 3.11, manylinux: glibc 2.34+ x86-64, manylinux: glibc 2.35+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac4d208cb216656283d4af97fcda3c9366ad3abaeabc0424664749583507bd5e
|
|
| MD5 |
875256fdf79c60e48942bec63f8561c3
|
|
| BLAKE2b-256 |
c1d4c2628a0f7058534e41674b0be68dc55f2115a21ea967aa04544cfb129811
|
Provenance
The following attestation bundles were made for fastvideo_kernel-0.2.5-cp311-cp311-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl:
Publisher:
fastvideo-kernel-publish.yml on hao-ai-lab/FastVideo
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fastvideo_kernel-0.2.5-cp311-cp311-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl -
Subject digest:
ac4d208cb216656283d4af97fcda3c9366ad3abaeabc0424664749583507bd5e - Sigstore transparency entry: 845077547
- Sigstore integration time:
-
Permalink:
hao-ai-lab/FastVideo@dbf3917bf4fa264d8aaa1d2dd8b2a7c1da40cde1 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/hao-ai-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
fastvideo-kernel-publish.yml@dbf3917bf4fa264d8aaa1d2dd8b2a7c1da40cde1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file fastvideo_kernel-0.2.5-cp310-cp310-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl.
File metadata
- Download URL: fastvideo_kernel-0.2.5-cp310-cp310-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl
- Upload date:
- Size: 13.8 MB
- Tags: CPython 3.10, manylinux: glibc 2.34+ x86-64, manylinux: glibc 2.35+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df8339054f0d0ffa43fc3082d4ce027b7a20547f32aa0794b625cd82e41e42f1
|
|
| MD5 |
9a7671e723ad213bb0c6bc2004693281
|
|
| BLAKE2b-256 |
b96a8fe85b463da1a1f82ef2f498ce2a84e87411b9507fe61ea590160f55e8b1
|
Provenance
The following attestation bundles were made for fastvideo_kernel-0.2.5-cp310-cp310-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl:
Publisher:
fastvideo-kernel-publish.yml on hao-ai-lab/FastVideo
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fastvideo_kernel-0.2.5-cp310-cp310-manylinux_2_34_x86_64.manylinux_2_35_x86_64.whl -
Subject digest:
df8339054f0d0ffa43fc3082d4ce027b7a20547f32aa0794b625cd82e41e42f1 - Sigstore transparency entry: 845077543
- Sigstore integration time:
-
Permalink:
hao-ai-lab/FastVideo@dbf3917bf4fa264d8aaa1d2dd8b2a7c1da40cde1 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/hao-ai-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
fastvideo-kernel-publish.yml@dbf3917bf4fa264d8aaa1d2dd8b2a7c1da40cde1 -
Trigger Event:
push
-
Statement type: