FlashAttention-3 forward
Project description
Flash-Attention-3 Forward-Only Kernel
This repository bundles the Flash-Attention-3 forward-only kernel and the tooling required to build a lightweight Python wheel. It is intended for inference scenarios where backward operators and optional features are unnecessary.
Highlights
- Ships only the Flash-Attention-3 forward path while disabling backward kernels, local attention, paged KV cache, FP16 kernels, and other extras to minimize the wheel size.
- Applies a patch that renames the public interface to
fa3_fwd_interface, making the forward kernel easy to import from Python.
Prerequisites(same as upstream)
- Python: 3.9 or later
- PyTorch: 2.10
- Build dependencies:
ninja,packaging,wheel
Quick Start
-
Clone the repository and initialize submodules:
git clone --recursive <repo-url> cd fa3-fwd # If --recursive was omitted during clone, run: git submodule update --init --recursive
-
Create a Python virtual environment and install dependencies:
uv venv --python 3.12 --seed source .venv/bin/activate uv pip install -r requirements.txt
-
Build the forward-only wheel:
bash build_fa3.shThe script:
- Sources set_compile_env.sh to compute
MAX_JOBSandNVCC_THREADS - Applies the custom patch and interface rename inside the Flash-Attention submodule
- Runs
python setup.py bdist_wheelunder flash-attention/hopper
- Sources set_compile_env.sh to compute
-
Install the generated wheel (example):
pip install build/*.whl
Python Usage Example
import torch
from fa3_fwd_interface import flash_attn_func
# Inputs must already live on CUDA and satisfy Flash-Attention-3 constraints
out = flash_attn_func(q, k, v, causal=True)
This package exposes only the forward kernel. For backward support or additional features, depend on the upstream Flash-Attention project instead.
Troubleshooting
- Out-of-memory during compilation: The build script already throttles concurrency, but you can enforce
MAX_JOBS=1 NVCC_THREADS=1before runningbash build_fa3.sh. - CUDA mismatch errors: Confirm that
nvcc --versionaligns withtorch.version.cuda.
Repository Layout
- build_fa3.sh: Main build entry point
- set_compile_env.sh: Resource-based compiler configuration helper
- hopper_setup_py.patch: Patch applied to the upstream
setup.py - flash-attention: Upstream Flash-Attention submodule
Customize further by editing environment variables in the build script or modifying the submodule before the patch is applied (for example to re-enable additional datatypes or kernels).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fa3_fwd-0.0.3-cp39-abi3-manylinux_2_24_x86_64.whl.
File metadata
- Download URL: fa3_fwd-0.0.3-cp39-abi3-manylinux_2_24_x86_64.whl
- Upload date:
- Size: 27.6 MB
- Tags: CPython 3.9+, manylinux: glibc 2.24+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
44d26c6cfc69dde5a0f9c6bebbccbb96ed20e97bc8b9ddda638b0bf62e91b93d
|
|
| MD5 |
dc536dd185e33a8eeb88d3247e53a88f
|
|
| BLAKE2b-256 |
eda6a4a31f4e500a3ac787cadf4c172aaf9fc74fb81f6ed78778fc8158384f00
|
File details
Details for the file fa3_fwd-0.0.3-cp39-abi3-manylinux_2_24_aarch64.whl.
File metadata
- Download URL: fa3_fwd-0.0.3-cp39-abi3-manylinux_2_24_aarch64.whl
- Upload date:
- Size: 27.9 MB
- Tags: CPython 3.9+, manylinux: glibc 2.24+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e0e51f2b4b094e8dd0526a5e4291a0f2d0366cc55e89b27c6b8bb469dd05cfa
|
|
| MD5 |
d034d9592bde974317a6d4c5c0b7a267
|
|
| BLAKE2b-256 |
75b061b7888b1699efc8299f2027c7d18130a35fc7f3819ad98c2f85317d3b1d
|