Flash Attention CUTE (CUDA Template Engine) implementation
Project description
FlashAttention-4 (CuTeDSL)
FlashAttention-4 is a CuTeDSL-based implementation of FlashAttention for Hopper and Blackwell GPUs.
Installation
pip install fa4
Usage
from flash_attn.cute import flash_attn_func, flash_attn_varlen_func
out = flash_attn_func(q, k, v, causal=True)
Development
git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
pip install -e "flash_attn/cute[dev]"
pytest tests/cute/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
flash_attn_4-4.0.0b3.tar.gz
(243.7 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flash_attn_4-4.0.0b3.tar.gz.
File metadata
- Download URL: flash_attn_4-4.0.0b3.tar.gz
- Upload date:
- Size: 243.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
06b6cff3bc49afd48f5b7dfad7ab237c207333d4f66736e5b567da0937e5b0a5
|
|
| MD5 |
5c9d9e2779eff8c8399a80a5177a4000
|
|
| BLAKE2b-256 |
8a36aab028ba5843b3ba21d37434c2b8891e4f105ec586261960086903b34228
|
File details
Details for the file flash_attn_4-4.0.0b3-py3-none-any.whl.
File metadata
- Download URL: flash_attn_4-4.0.0b3-py3-none-any.whl
- Upload date:
- Size: 261.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86183c0759689324224fa624437f64518a3b5852e5b7c77aab0a6d8a12ace537
|
|
| MD5 |
e2fcc7bfdf1ccebc92f2c1a1a270ee65
|
|
| BLAKE2b-256 |
d68550b336261a4e7c801215e2a6e8ef9cc6f236e8ad7ed6eea31e8cb66c1804
|