Flash Attention CUTE (CUDA Template Engine) implementation
Project description
FlashAttention-4 (CuTeDSL)
FlashAttention-4 is a CuTeDSL-based implementation of FlashAttention for Hopper and Blackwell GPUs.
Installation
pip install flash-attn4
Usage
from flash_attn.cute import flash_attn_func, flash_attn_varlen_func
out = flash_attn_func(q, k, v, causal=True)
Development
git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
pip install -e "flash_attn/cute[dev]"
pytest tests/cute/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
fa4-4.0.0b3.tar.gz
(243.7 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
fa4-4.0.0b3-py3-none-any.whl
(261.6 kB
view details)
File details
Details for the file fa4-4.0.0b3.tar.gz.
File metadata
- Download URL: fa4-4.0.0b3.tar.gz
- Upload date:
- Size: 243.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
72e368324767976a132fbcd0adeaaad08ccc76dabcb63709d2df94d002936d81
|
|
| MD5 |
9caf32b77883f5ea2138ab153a7e76e1
|
|
| BLAKE2b-256 |
d74f97e6ce858a9539bf859d02f26a1ac1a03a13801e1107b618bdcfc0a127d9
|
File details
Details for the file fa4-4.0.0b3-py3-none-any.whl.
File metadata
- Download URL: fa4-4.0.0b3-py3-none-any.whl
- Upload date:
- Size: 261.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c5fac755d64366cd1052f4b9e674d27b5fb7a4b5c92700647c94c06074618547
|
|
| MD5 |
1ff0ea661241f5ea29cf7cc35b935fe4
|
|
| BLAKE2b-256 |
e920d6fada8b91f08bf4fd65ec55df5b18d1c804cde480eb97fdce63d0e989c9
|