Skip to main content

Sparse matrix operations for AWS Trainium via NKI

Project description

trnsparse

CI codecov Ruff PyPI Python License Docs

Sparse matrix operations for AWS Trainium via NKI.

CSR/COO formats, SpMV, SpMM, and integral screening for sparse scientific computing on Trainium. Part of the trnsci scientific computing suite (github.com/trnsci).

Current phase

trnsparse follows the trnsci 5-phase roadmap. Active work is tracked in phase-labeled GitHub issues:

  • Phase 1 — correctness ✅ v0.2.0: NKI SpMM validated on trn1 via densify-then-GEMM (see trnsci/trnsci#3).
  • v0.3.0BSRMatrix — Trainium-native 128×128 block-sparse format; bsr_spmm NKI kernel. CSR becomes interop; BSR is the compute path.
  • v0.3.2cg_bsr, power_iteration_bsr — iterative solvers over BSR (Python loop; fused kernel gated on NKI capability).
  • v0.4.0screened_spmm — fused Schwarz-screened SpMM in one NKI dispatch.
  • v0.4.2 ✅ Block-sparse attention — BSRMatrix + bsr_spmm as the primitive; examples/block_sparse_attention.py + docs/sparse_attention.md.
  • v0.4.3 ✅ Architecture-friendly alternatives: chebyshev_bsr / richardson_bsr (fixed-K solvers, no inner products); block_sparse_attention_tiled (two-pass, no O(seq²) intermediate).
  • Phase 3 — perf: nnz-bucketing, NKI attention kernel pair — parked on NKI indirect DMA gather / hardware validation.
  • Phase 4 — multi-chip: sharded BSR across NeuronCores.
  • Phase 5 — generation: trn2 DMA bandwidth exploitation.

(No Phase 2 for trnsparse — precision inherited from trnblas.)

Suite-wide tracker: trnsci/trnsci#1.

Install

pip install trnsparse

Usage

import torch
import trnsparse

# Dense → sparse
A = torch.randn(100, 100)
A[torch.abs(A) < 1.0] = 0.0
csr = trnsparse.from_dense(A)

# SpMV: y = A @ x
y = trnsparse.spmv(csr, x, alpha=2.0)

# SpMM: C = A @ B
C = trnsparse.spmm(csr, B)

# Integral screening
Q = trnsparse.schwarz_bounds(diagonal_integrals)
mask = trnsparse.screen_quartets(Q, threshold=1e-10)
stats = trnsparse.sparsity_stats(Q)

Operations

Operation Description
spmv Sparse × dense vector (CSR)
spmm Sparse × dense matrix (CSR, PyTorch fallback)
bsr_spmm Block-sparse × dense (BSR-native, Tensor Engine)
screened_spmm Fused Schwarz-screened matmul (one NKI dispatch)
spmv_symmetric Symmetric SpMV (half storage)
sparse_add C = αA + βB
sparse_scale B = αA
sparse_transpose Aᵀ
cg_bsr Conjugate Gradient on BSR matrix
chebyshev_bsr Fixed-K Chebyshev semi-iteration (no inner products)
richardson_bsr Fixed-K Richardson iteration
power_iteration_bsr Dominant eigenpair via power iteration
jacobi_preconditioner_bsr Diagonal preconditioner for cg_bsr
bsr_diagonal Extract main diagonal from BSR matrix
block_sparse_attention_tiled Two-pass sparse attention, no O(seq²) intermediate
schwarz_bounds Schwarz screening bounds
screen_quartets Shell quartet significance mask
density_screen Density-weighted screening

License

Apache 2.0 — Copyright 2026 Scott Friedman

Disclaimer

trnsci is an independent open-source project. It is not sponsored by, endorsed by, or affiliated with Amazon.com, Inc., Amazon Web Services, Inc., or Annapurna Labs Ltd.

"AWS", "Amazon", "Trainium", "Inferentia", "NeuronCore", "Neuron SDK", and related identifiers are trademarks of their respective owners and are used here solely for descriptive and interoperability purposes. Use does not imply endorsement, partnership, or any other relationship.

All work, opinions, analyses, benchmark results, architectural commentary, and editorial judgments in this repository and on trnsci.dev are those of the project's contributors. They do not represent the views, positions, or commitments of Amazon, AWS, or Annapurna Labs.

Feedback directed at the Neuron SDK or Trainium hardware is good-faith ecosystem commentary from independent users. It is not privileged information, is not pre-reviewed by AWS, and should not be read as authoritative about product roadmap, behavior, or quality.

For official AWS guidance, see aws-neuron documentation and the AWS Trainium product page.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trnsparse-0.6.0.tar.gz (96.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

trnsparse-0.6.0-py3-none-any.whl (37.7 kB view details)

Uploaded Python 3

File details

Details for the file trnsparse-0.6.0.tar.gz.

File metadata

  • Download URL: trnsparse-0.6.0.tar.gz
  • Upload date:
  • Size: 96.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for trnsparse-0.6.0.tar.gz
Algorithm Hash digest
SHA256 b239df55d29472ef78e05581fb5f7ea5845d97dac85c04f9b11c4c0dacfc436d
MD5 11f3e39c99828947f2bb0426c853d6f6
BLAKE2b-256 ce4bbd2f47a749c50b526e5104f3b6ef3b1e8b854950058612bc5201a7317f9a

See more details on using hashes here.

Provenance

The following attestation bundles were made for trnsparse-0.6.0.tar.gz:

Publisher: publish.yml on trnsci/trnsparse

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file trnsparse-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: trnsparse-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 37.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for trnsparse-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 712d1d1d9423db27f2a4057bbdce7f630fbe137240e227bc7543e2993caa934e
MD5 29ac2db52ce5404ad0890eac37ab5408
BLAKE2b-256 b8471600b94b105a1a520317a342b5c275bb00a7613268f0a09de3e8baef1fb1

See more details on using hashes here.

Provenance

The following attestation bundles were made for trnsparse-0.6.0-py3-none-any.whl:

Publisher: publish.yml on trnsci/trnsparse

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page