Skip to main content

High-performance ML primitives, applications, and informative cost API — Triton + CuteDSL kernels for NVIDIA GPUs.

Project description

FlashLib

A GPU library for classical machine-learning operators — kmeans, knn, pca, svd, dbscan, hdbscan, umap, t-sne, regression, GEMM, and more — built on Triton and CuteDSL.

See the blog post for motivation, design, and benchmarks.

Installation

Install with pip:

pip install flashlib

From source:

git clone https://github.com/FlashML-org/flashlib.git
cd flashlib
pip install -e .

Usage

import torch
from flashlib import flash_kmeans

x = torch.randn(1_000_000, 128, device="cuda", dtype=torch.float32)
labels, centroids, n_iter = flash_kmeans(x, n_clusters=1024, max_iters=20)

Every primitive is exposed as a top-level flash_* function and as a sklearn-style class (KMeans, PCA, HDBSCAN, …).

Informative API

The flashlib.info submodule predicts runtime, FLOPs, and HBM bytes for any primitive in ~5 µs on pure CPU — useful for budgeting a pipeline before launching it, and small enough for an LLM agent to call in a GPU-less environment. It does not import torch, triton, or cutlass.

import flashlib.info as info

est = info.estimate("kmeans",
                    shape=(100_000, 64),
                    params={"K": 256, "max_iters": 20},
                    device="H200")
print(est.summary_line())

See the blog post for the full API, the tolerance-driven dispatch, and per-primitive benchmarks.

Coverage

The current release ships 15 high-level primitives across the following families:

family primitives
Clustering flash_kmeans, flash_dbscan, flash_hdbscan, flash_spectral_clustering
Nearest nbrs flash_knn
Decomposition flash_pca, flash_truncated_svd
Manifold flash_umap, flash_tsne
Regression flash_linear_regression, flash_ridge, flash_logistic_regression
Classification flash_multinomial_nb, flash_random_forest
Preprocessing flash_standard_scaler

Plus low-level linear-algebra primitives (cov_gemm, gram_gemm, ab_gemm, eigh, polar, msign, cholqr2, split_basis) and a Pareto-frontier set of multi-precision GEMM variants (gemm, gemm_tf32, gemm_3xtf32, gemm_bf16, gemm_fp16, gemm_fp16_x9, gemm_fp16_x3_kahan, gemm_ozaki2_int8, …).

Citation

@misc{yang2026flashlib,
  title  = {FlashLib: Bringing Flash Magic to Classical Machine Learning Operators},
  author = {Yang, Shuo and Xi, Haocheng and Zhao, Yilong and Mang, Qiuyang and
            Wang, Zhe and Sun, Shanlin and Keutzer, Kurt and Gonzalez, Joseph E. and
            Han, Song and Xu, Chenfeng and Stoica, Ion},
  year   = {2026},
  url    = {https://flashml-org.github.io/},
}

License

Apache License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flashlib-0.1.0.tar.gz (512.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flashlib-0.1.0-py3-none-any.whl (617.5 kB view details)

Uploaded Python 3

File details

Details for the file flashlib-0.1.0.tar.gz.

File metadata

  • Download URL: flashlib-0.1.0.tar.gz
  • Upload date:
  • Size: 512.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for flashlib-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5f8b6e4b4348de3777755520d6452361af608c740ee4eee346e35f8f48032a0e
MD5 2c4cab05d6af77aa258cb823f2f20e48
BLAKE2b-256 407e28ff85b9cba087a9681eb619d3cf28422327adff7ef96e09f142ec019590

See more details on using hashes here.

File details

Details for the file flashlib-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: flashlib-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 617.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for flashlib-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2a6437a4120e5e8579c0fb63bc9308e1198e89e6246357f82bcf7d65edbdba99
MD5 16e30a1a1369edca0721b6a775bd5d92
BLAKE2b-256 3534d1741d01a37d8b230c65888b9be21e166ddee26751f84878273c63735aaa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page