BLAS operations for AWS Trainium via NKI
Project description
trnblas
BLAS operations for AWS Trainium via NKI (Neuron Kernel Interface).
Trainium ships no BLAS library. trnblas provides Level 1-3 BLAS operations with NKI kernel acceleration on the Tensor Engine, targeting scientific computing workloads that are GEMM-dominated.
Part of the trn-* scientific computing suite by Playground Logic.
Why
NVIDIA has cuBLAS with 152 optimized routines. Trainium has torch.matmul. That's fine for ML training but insufficient for scientific computing codes that need TRSM, SYRK, SYMM, and batched GEMM with specific transpose/scaling semantics.
trnblas closes this gap — same BLAS API surface, NKI-accelerated GEMM on Trainium, PyTorch fallback everywhere else.
Install
pip install trnblas
# With Neuron hardware support
pip install trnblas[neuron]
Usage
import torch
import trnblas
# Level 3 — Matrix multiply (the hot path)
C = trnblas.gemm(alpha=1.0, A=A, B=B, beta=0.5, C=C_init, transA=True)
# Batched GEMM (DF-MP2 tensor contractions)
C = trnblas.batched_gemm(1.0, A_batch, B_batch)
# Symmetric matrix multiply (Fock builds)
F = trnblas.symm(1.0, density, H_core, side="left")
# Triangular solve (Cholesky-based density fitting)
X = trnblas.trsm(1.0, L, B, uplo="lower")
# Symmetric rank-k update (metric construction)
J = trnblas.syrk(1.0, integrals, trans=True)
# Level 2 — Matrix-vector
y = trnblas.gemv(1.0, A, x, beta=1.0, y=y)
# Level 1 — Vector operations
y = trnblas.axpy(alpha, x, y)
d = trnblas.dot(x, y)
n = trnblas.nrm2(x)
DF-MP2 Example
# Run the density-fitted MP2 example (Janesko/TCU use case)
python examples/df_mp2.py --demo
python examples/df_mp2.py --nbasis 100 --nocc 20
The example demonstrates all core BLAS operations in a realistic quantum chemistry workflow: Cholesky factorization, triangular solve, half-transform GEMMs, metric contraction, and energy evaluation.
Operations
| Level | Operation | Description |
|---|---|---|
| 1 | axpy |
y = αx + y |
| 1 | dot |
x^T y |
| 1 | nrm2 |
‖x‖₂ |
| 1 | scal |
x = αx |
| 1 | asum |
Σ|xᵢ| |
| 1 | iamax |
argmax |xᵢ| |
| 2 | gemv |
y = α op(A) x + βy |
| 2 | symv |
y = α A x + βy (A symmetric) |
| 2 | trmv |
x = op(A) x (A triangular) |
| 2 | ger |
A = α x yᵀ + A |
| 3 | gemm |
C = α op(A) op(B) + βC |
| 3 | batched_gemm |
Batched GEMM |
| 3 | symm |
C = α A B + βC (A symmetric) |
| 3 | syrk |
C = α A Aᵀ + βC |
| 3 | trsm |
Solve op(A) X = αB |
| 3 | trmm |
B = α op(A) B |
Status
- Level 1-3 BLAS with PyTorch backend
- GEMM with NKI dispatch stub
- DF-MP2 example (Janesko/TCU use case)
- NKI GEMM kernel validation on trn1/trn2
- NKI GEMM with stationary tile reuse
- Batched GEMM NKI kernel
- Double-double FP64 emulation
- Benchmarks vs cuBLAS
Related Projects
| Project | What |
|---|---|
| trnfft | FFT + complex ops for Trainium (Williamson/OSU use case) |
| trnsolver (planned) | Linear solvers and eigendecomposition |
License
Apache 2.0 — Playground Logic LLC
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file trnblas-0.2.0.tar.gz.
File metadata
- Download URL: trnblas-0.2.0.tar.gz
- Upload date:
- Size: 21.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a2d4f5294ff9864cd7a8aa6f2c43a5401a8f9c6b90b4440f68d14e4e89bede0
|
|
| MD5 |
0e1236e0dff57f58f1702322f5801674
|
|
| BLAKE2b-256 |
6013f4d213685819fc1ffd098b07b0405f11f464c6cbd83d9fbbcb004361b4cb
|
Provenance
The following attestation bundles were made for trnblas-0.2.0.tar.gz:
Publisher:
publish.yml on scttfrdmn/trnblas
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
trnblas-0.2.0.tar.gz -
Subject digest:
8a2d4f5294ff9864cd7a8aa6f2c43a5401a8f9c6b90b4440f68d14e4e89bede0 - Sigstore transparency entry: 1280372334
- Sigstore integration time:
-
Permalink:
scttfrdmn/trnblas@3cf41a7080b84136f40edd6dcf5f8509d9a916e1 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/scttfrdmn
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3cf41a7080b84136f40edd6dcf5f8509d9a916e1 -
Trigger Event:
release
-
Statement type:
File details
Details for the file trnblas-0.2.0-py3-none-any.whl.
File metadata
- Download URL: trnblas-0.2.0-py3-none-any.whl
- Upload date:
- Size: 9.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c705c541da41a39e89903d530939dd18985f5aff575a2a4db6befa465d33745b
|
|
| MD5 |
1006a579b26417bef1357cf78e5cd6f3
|
|
| BLAKE2b-256 |
5f31330cd64d17f3f28e0d57b34b61914a72f0e9cb4e22a8190424e616e37b38
|
Provenance
The following attestation bundles were made for trnblas-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on scttfrdmn/trnblas
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
trnblas-0.2.0-py3-none-any.whl -
Subject digest:
c705c541da41a39e89903d530939dd18985f5aff575a2a4db6befa465d33745b - Sigstore transparency entry: 1280372337
- Sigstore integration time:
-
Permalink:
scttfrdmn/trnblas@3cf41a7080b84136f40edd6dcf5f8509d9a916e1 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/scttfrdmn
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3cf41a7080b84136f40edd6dcf5f8509d9a916e1 -
Trigger Event:
release
-
Statement type: