A flash attention(s) implementation in triton.
Project description
Flash Attention X
Flash Attention X is a flash attention(s) implementation using Triton, including flash_attention_full, flash_attention_causal and flash_attention_bias...
Installation
You can install Flash Attention X using pip:
pip install flash_attention_x
pip install -e .
Requirements
- Python >= 3.8
- Triton >= 2.3.0
The package is tested with Triton 2.3.0 and CUDA 12.0.
Features
- Efficient implementation of flash attention(s), including flash_attention_full, flash_attention_causal and flash_attention_bias...
- Built using Triton for optimized performance
License
This project is licensed under the MIT License - see the LICENSE file for details.
Author
Xiaotian Han
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file flash_attention_x-0.0.1.tar.gz
.
File metadata
- Download URL: flash_attention_x-0.0.1.tar.gz
- Upload date:
- Size: 6.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e83277a2bd3d7023d22cb5c8b6dd159d4bd99cabfce60dde9426f091b01f242a |
|
MD5 | df1f9ca384e86a8fcd8b6618162f10a8 |
|
BLAKE2b-256 | c69e38f907af63cc1ca3ee67f6cbea8a40d44b62efbf6e0cb7a4c3c530e41874 |