Skip to main content

BLAS operations for AWS Trainium via NKI

Project description

trnblas

CI PyPI Python License Docs

BLAS operations for AWS Trainium via NKI (Neuron Kernel Interface).

Trainium ships no BLAS library. trnblas provides Level 1-3 BLAS operations with NKI kernel acceleration on the Tensor Engine, targeting scientific computing workloads that are GEMM-dominated.

Part of the trn-* scientific computing suite by Playground Logic.

Why

NVIDIA has cuBLAS with 152 optimized routines. Trainium has torch.matmul. That's fine for ML training but insufficient for scientific computing codes that need TRSM, SYRK, SYMM, and batched GEMM with specific transpose/scaling semantics.

trnblas closes this gap — same BLAS API surface, NKI-accelerated GEMM on Trainium, PyTorch fallback everywhere else.

Install

pip install trnblas

# With Neuron hardware support
pip install trnblas[neuron]

Usage

import torch
import trnblas

# Level 3 — Matrix multiply (the hot path)
C = trnblas.gemm(alpha=1.0, A=A, B=B, beta=0.5, C=C_init, transA=True)

# Batched GEMM (DF-MP2 tensor contractions)
C = trnblas.batched_gemm(1.0, A_batch, B_batch)

# Symmetric matrix multiply (Fock builds)
F = trnblas.symm(1.0, density, H_core, side="left")

# Triangular solve (Cholesky-based density fitting)
X = trnblas.trsm(1.0, L, B, uplo="lower")

# Symmetric rank-k update (metric construction)
J = trnblas.syrk(1.0, integrals, trans=True)

# Level 2 — Matrix-vector
y = trnblas.gemv(1.0, A, x, beta=1.0, y=y)

# Level 1 — Vector operations
y = trnblas.axpy(alpha, x, y)
d = trnblas.dot(x, y)
n = trnblas.nrm2(x)

DF-MP2 Example

# Run the density-fitted MP2 example (Janesko/TCU use case)
python examples/df_mp2.py --demo
python examples/df_mp2.py --nbasis 100 --nocc 20

The example demonstrates all core BLAS operations in a realistic quantum chemistry workflow: Cholesky factorization, triangular solve, half-transform GEMMs, metric contraction, and energy evaluation.

Operations

Level Operation Description
1 axpy y = αx + y
1 dot x^T y
1 nrm2 ‖x‖₂
1 scal x = αx
1 asum Σ|xᵢ|
1 iamax argmax |xᵢ|
2 gemv y = α op(A) x + βy
2 symv y = α A x + βy (A symmetric)
2 trmv x = op(A) x (A triangular)
2 ger A = α x yᵀ + A
3 gemm C = α op(A) op(B) + βC
3 batched_gemm Batched GEMM
3 symm C = α A B + βC (A symmetric)
3 syrk C = α A Aᵀ + βC
3 trsm Solve op(A) X = αB
3 trmm B = α op(A) B

Status

  • Level 1-3 BLAS with PyTorch backend
  • GEMM with NKI dispatch stub
  • DF-MP2 example (Janesko/TCU use case)
  • NKI GEMM kernel validation on trn1/trn2
  • NKI GEMM with stationary tile reuse
  • Batched GEMM NKI kernel
  • Double-double FP64 emulation
  • Benchmarks vs cuBLAS

Related Projects

Project What
trnfft FFT + complex ops for Trainium (Williamson/OSU use case)
trnsolver (planned) Linear solvers and eigendecomposition

License

Apache 2.0 — Playground Logic LLC

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trnblas-0.2.0.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

trnblas-0.2.0-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file trnblas-0.2.0.tar.gz.

File metadata

  • Download URL: trnblas-0.2.0.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for trnblas-0.2.0.tar.gz
Algorithm Hash digest
SHA256 8a2d4f5294ff9864cd7a8aa6f2c43a5401a8f9c6b90b4440f68d14e4e89bede0
MD5 0e1236e0dff57f58f1702322f5801674
BLAKE2b-256 6013f4d213685819fc1ffd098b07b0405f11f464c6cbd83d9fbbcb004361b4cb

See more details on using hashes here.

Provenance

The following attestation bundles were made for trnblas-0.2.0.tar.gz:

Publisher: publish.yml on scttfrdmn/trnblas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file trnblas-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: trnblas-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 9.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for trnblas-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c705c541da41a39e89903d530939dd18985f5aff575a2a4db6befa465d33745b
MD5 1006a579b26417bef1357cf78e5cd6f3
BLAKE2b-256 5f31330cd64d17f3f28e0d57b34b61914a72f0e9cb4e22a8190424e616e37b38

See more details on using hashes here.

Provenance

The following attestation bundles were made for trnblas-0.2.0-py3-none-any.whl:

Publisher: publish.yml on scttfrdmn/trnblas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page