Fake flash-attn package for T4 / CPU / compatibility

Project description

fake-flash-attention ⚡️

A drop-in, pure-Python shim for the flash-attn package. It redirects all FlashAttention calls to PyTorch's native scaled_dot_product_attention (SDPA).

[!TIP] Successfully tested with the music generation model Stable Audio 3.

Why is this necessary?

Modern Large Language Models (LLMs) and popular libraries (like Hugging Face Transformers) often have hard-coded dependencies on the flash-attn package. However, the official flash-attn library has strict requirements:

NVIDIA GPU only: Requires Turing, Ampere, Ada, or Hopper architectures (e.g., RTX 20/30/40, A100, H100).
No Support for Older GPUs: Common GPUs like the NVIDIA T4 (standard in Google Colab) or GTX 10-series cards cannot run official FlashAttention kernels.
No CPU Support: Official flash-attn cannot be installed or run in CPU-only environments.
Complex Compilation: The build process is heavy and requires specific CUDA toolkit versions.

fake-flash-attention solves this by:

API Parity: It exports the exact same functions (e.g., flash_attn_func) so that libraries don't crash with an ImportError.
Hardware Portability: It leverages PyTorch's scaled_dot_product_attention, which is highly optimized and works on T4, older GPUs, and CPUs.
Instant Setup: It is a pure-Python package with no C++/CUDA compilation required.

Installation

pip install fake-flash-attention

Note: If installing from source:

pip install .

Usage

If a library or script requires flash-attn, install this package. Existing code will work transparently:

from flash_attn import flash_attn_func
import torch

q, k, v = torch.randn(1, 12, 256, 64), torch.randn(1, 12, 256, 64), torch.randn(1, 12, 256, 64)
# This now uses PyTorch SDPA under the hood!
output = flash_attn_func(q, k, v, causal=True)

Supported Features

✅ flash_attn_func
✅ flash_attn_varlen_func
✅ flash_attn_qkvpacked_func / kvpacked
✅ FlashAttention-2 API compatibility
✅ Device-agnostic (CPU, CUDA, MPS)

Project details

Release history Release notifications | RSS feed

This version

2.6.3.post2

Jun 11, 2026

2.6.3.post1

Jun 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fake_flash_attention-2.6.3.post2.tar.gz (15.8 kB view details)

Uploaded Jun 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fake_flash_attention-2.6.3.post2-py3-none-any.whl (16.4 kB view details)

Uploaded Jun 11, 2026 Python 3

File details

Details for the file fake_flash_attention-2.6.3.post2.tar.gz.

File metadata

Download URL: fake_flash_attention-2.6.3.post2.tar.gz
Upload date: Jun 11, 2026
Size: 15.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for fake_flash_attention-2.6.3.post2.tar.gz
Algorithm	Hash digest
SHA256	`7d63f169b57ea456e7daa18da9e687b2641c4ef79ab51d5c165df5afece800d2`
MD5	`a49cab30e255f4aaf5fd8d8f034cfb99`
BLAKE2b-256	`155081956a78c6f4b87c2721912dfea23401b8c5cc42c6ff8fc46e709d55f710`

See more details on using hashes here.

File details

Details for the file fake_flash_attention-2.6.3.post2-py3-none-any.whl.

File metadata

Download URL: fake_flash_attention-2.6.3.post2-py3-none-any.whl
Upload date: Jun 11, 2026
Size: 16.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for fake_flash_attention-2.6.3.post2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5018a5216e2e8b6768b07e057021ceb404d7c0a881367a1bef58c1f1ea4ca594`
MD5	`4608153dfa827c63204ea4bcea3d8a0f`
BLAKE2b-256	`ffeccbe8eb070c88b2c3057e42f2097f5c430038ed493d813f407aa49ef4959f`

See more details on using hashes here.

fake-flash-attention 2.6.3.post2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

fake-flash-attention ⚡️

Why is this necessary?

Installation

Usage

Supported Features

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes