6 projects
flash-attn
Flash Attention: Fast and Memory-Efficient Exact Attention
mamba-ssm
Mamba state-space model
causal-conv1d
Causal depthwise conv1d in CUDA, with a PyTorch interface
quant-matmul
Quantized MatMul in CUDA with a PyTorch interface
fast-hadamard-transform
Fast Hadamard Transform in CUDA, with a PyTorch interface
flash-attn-wheels-test
Flash Attention: Fast and Memory-Efficient Exact Attention