Skip to main content

Flash Attention CUTE (CUDA Template Engine) implementation

Project description

FlashAttention-4 (CuTeDSL)

FlashAttention-4 is a CuTeDSL-based implementation of FlashAttention for Hopper and Blackwell GPUs.

Installation

pip install flash-attn-4

If you're on CUDA 13, install with the cu13 extra for best performance:

pip install "flash-attn-4[cu13]"

Usage

from flash_attn.cute import flash_attn_func, flash_attn_varlen_func

out = flash_attn_func(q, k, v, causal=True)

Development

git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
pip install -e "flash_attn/cute[dev]"       # CUDA 12.x
pip install -e "flash_attn/cute[dev,cu13]"  # CUDA 13.x (e.g. B200)
pytest tests/cute/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flash_attn_4-4.0.0b14.tar.gz (311.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flash_attn_4-4.0.0b14-py3-none-any.whl (334.4 kB view details)

Uploaded Python 3

File details

Details for the file flash_attn_4-4.0.0b14.tar.gz.

File metadata

  • Download URL: flash_attn_4-4.0.0b14.tar.gz
  • Upload date:
  • Size: 311.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flash_attn_4-4.0.0b14.tar.gz
Algorithm Hash digest
SHA256 f0bf73fc572d5a7853d28e6184da6f61f9edc2267720964e9c3d92f3553384a3
MD5 e88398fd729c8a73231bfff396087d61
BLAKE2b-256 1e6a7f25b81d87f2896d893d40e59fba5dd7a3f68e1481d3ad8167f8c394acce

See more details on using hashes here.

Provenance

The following attestation bundles were made for flash_attn_4-4.0.0b14.tar.gz:

Publisher: publish-fa4.yml on Dao-AILab/flash-attention

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file flash_attn_4-4.0.0b14-py3-none-any.whl.

File metadata

  • Download URL: flash_attn_4-4.0.0b14-py3-none-any.whl
  • Upload date:
  • Size: 334.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flash_attn_4-4.0.0b14-py3-none-any.whl
Algorithm Hash digest
SHA256 496d3fe599994a3c7471a31370fc36f932923c9038be51298a7319dd9268b271
MD5 71e745de38248655aa6b35c8e9c400c0
BLAKE2b-256 7f4dcd6669b545742d5316cb3e553f35b5474f6010a5280349dfc226757bc26d

See more details on using hashes here.

Provenance

The following attestation bundles were made for flash_attn_4-4.0.0b14-py3-none-any.whl:

Publisher: publish-fa4.yml on Dao-AILab/flash-attention

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page