Skip to main content

BLAS operations for AWS Trainium via NKI

Project description

trnblas

CI codecov PyPI Python License Docs

BLAS operations for AWS Trainium via NKI (Neuron Kernel Interface).

Trainium ships no BLAS library. trnblas provides Level 1-3 BLAS operations with NKI kernel acceleration on the Tensor Engine, targeting scientific computing workloads that are GEMM-dominated.

Part of the trnsci scientific computing suite (github.com/trnsci).

Current phase

trnblas follows the trnsci 5-phase roadmap. Active work is tracked in phase-labeled GitHub issues:

Suite-wide tracker: trnsci/trnsci#1.

Why

NVIDIA has cuBLAS with 152 optimized routines. Trainium has torch.matmul. That's fine for ML training but insufficient for scientific computing codes that need TRSM, SYRK, SYMM, and batched GEMM with specific transpose/scaling semantics.

trnblas closes this gap — same BLAS API surface, NKI-accelerated GEMM on Trainium, PyTorch fallback everywhere else.

Install

pip install trnblas

# With Neuron hardware support
pip install trnblas[neuron]

Usage

import torch
import trnblas

# Level 3 — Matrix multiply (the hot path)
C = trnblas.gemm(alpha=1.0, A=A, B=B, beta=0.5, C=C_init, transA=True)

# Batched GEMM (DF-MP2 tensor contractions)
C = trnblas.batched_gemm(1.0, A_batch, B_batch)

# Symmetric matrix multiply (Fock builds)
F = trnblas.symm(1.0, density, H_core, side="left")

# Triangular solve (Cholesky-based density fitting)
X = trnblas.trsm(1.0, L, B, uplo="lower")

# Symmetric rank-k update (metric construction)
J = trnblas.syrk(1.0, integrals, trans=True)

# Level 2 — Matrix-vector
y = trnblas.gemv(1.0, A, x, beta=1.0, y=y)

# Level 1 — Vector operations
y = trnblas.axpy(alpha, x, y)
d = trnblas.dot(x, y)
n = trnblas.nrm2(x)

DF-MP2 Example

# Run the density-fitted MP2 example
python examples/df_mp2.py --demo
python examples/df_mp2.py --nbasis 100 --nocc 20

The example demonstrates all core BLAS operations in a realistic quantum chemistry workflow: Cholesky factorization, triangular solve, half-transform GEMMs, metric contraction, and energy evaluation.

Real-molecule validation (via PySCF)

pip install trnblas[pyscf]
python examples/df_mp2_pyscf.py                       # H2O / STO-3G
python examples/df_mp2_pyscf.py --mol ch4 --basis cc-pvdz

Runs SCF + density fitting via PySCF, feeds the integrals through trnblas, and compares to PySCF's own DF-MP2 reference energy. Matches to < 10⁻⁷ Hartree on H2O, CH4, NH3 at cc-pvdz.

Operations

Level Operation Description
1 axpy y = αx + y
1 dot x^T y
1 nrm2 ‖x‖₂
1 scal x = αx
1 asum Σ|xᵢ|
1 iamax argmax |xᵢ|
2 gemv y = α op(A) x + βy
2 symv y = α A x + βy (A symmetric)
2 trmv x = op(A) x (A triangular)
2 ger A = α x yᵀ + A
3 gemm C = α op(A) op(B) + βC
3 batched_gemm Batched GEMM
3 symm C = α A B + βC (A symmetric)
3 syrk C = α A Aᵀ + βC
3 trsm Solve op(A) X = αB
3 trmm B = α op(A) B

Status

  • Level 1-3 BLAS with PyTorch backend
  • GEMM with NKI dispatch stub
  • DF-MP2 example
  • NKI GEMM kernel validation on trn1/trn2
  • NKI GEMM with stationary tile reuse
  • Batched GEMM NKI kernel
  • Double-double FP64 emulation
  • Benchmarks vs cuBLAS

Related Projects

Project What
trnfft FFT + complex ops for Trainium
trnrand Random number generation (Philox/Sobol) for Trainium
trnsolver Linear solvers and eigendecomposition

License

Apache 2.0 — Copyright 2026 Scott Friedman

Disclaimer

trnsci is an independent open-source project. It is not sponsored by, endorsed by, or affiliated with Amazon.com, Inc., Amazon Web Services, Inc., or Annapurna Labs Ltd.

"AWS", "Amazon", "Trainium", "Inferentia", "NeuronCore", "Neuron SDK", and related identifiers are trademarks of their respective owners and are used here solely for descriptive and interoperability purposes. Use does not imply endorsement, partnership, or any other relationship.

All work, opinions, analyses, benchmark results, architectural commentary, and editorial judgments in this repository and on trnsci.dev are those of the project's contributors. They do not represent the views, positions, or commitments of Amazon, AWS, or Annapurna Labs.

Feedback directed at the Neuron SDK or Trainium hardware is good-faith ecosystem commentary from independent users. It is not privileged information, is not pre-reviewed by AWS, and should not be read as authoritative about product roadmap, behavior, or quality.

For official AWS guidance, see aws-neuron documentation and the AWS Trainium product page.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trnblas-0.5.4.tar.gz (204.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

trnblas-0.5.4-py3-none-any.whl (28.6 kB view details)

Uploaded Python 3

File details

Details for the file trnblas-0.5.4.tar.gz.

File metadata

  • Download URL: trnblas-0.5.4.tar.gz
  • Upload date:
  • Size: 204.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for trnblas-0.5.4.tar.gz
Algorithm Hash digest
SHA256 299419d9058aa47bdb6416a751ab7bb6d3d1ef91cc36d44324d6d00053a8f822
MD5 49861a65b3e94215f0b99d1ba3a2b12c
BLAKE2b-256 9cba2968c93c98dbd34d9563a7fb2bde679feb30ebeda0b774e4b9a75da1340c

See more details on using hashes here.

Provenance

The following attestation bundles were made for trnblas-0.5.4.tar.gz:

Publisher: publish.yml on trnsci/trnblas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file trnblas-0.5.4-py3-none-any.whl.

File metadata

  • Download URL: trnblas-0.5.4-py3-none-any.whl
  • Upload date:
  • Size: 28.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for trnblas-0.5.4-py3-none-any.whl
Algorithm Hash digest
SHA256 2c34fb50e4b4f965455c872578dd9c07ab8482b40c9c6b2514865a81f4d20751
MD5 b3f5f24cc2c3457714ae172eca82e7e1
BLAKE2b-256 5e43d2d88c2e01aa6ed1e4e655dd07c0d36cdf6908094dec878a271278dddd34

See more details on using hashes here.

Provenance

The following attestation bundles were made for trnblas-0.5.4-py3-none-any.whl:

Publisher: publish.yml on trnsci/trnblas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page