FFT and complex-valued tensor operations for AWS Trainium via NKI
Project description
trnfft
FFT and complex-valued tensor operations for AWS Trainium via NKI.
Trainium has no native complex number support and ships no FFT library. trnfft fills that gap with split real/imaginary representation, complex neural network layers, and NKI kernels optimized for the NeuronCore architecture.
Incorporates neuron-complex-ops. Part of the trnsci scientific computing suite (github.com/trnsci).
Why
NVIDIA has cuFFT, cuBLAS, and native complex64. Trainium has none of these. Every signal processing, speech enhancement, physics simulation, and spectral method workload on Trainium currently falls back to CPU or requires hand-rolling complex arithmetic. trnfft fixes this.
Install
pip install trnfft
# With Neuron hardware support
pip install trnfft[neuron]
Usage
import torch
import trnfft
# Drop-in replacement for torch.fft
signal = torch.randn(1024)
X = trnfft.fft(signal)
recovered = trnfft.ifft(X)
# Real-valued FFT
X = trnfft.rfft(signal)
# 2D FFT
image = torch.randn(256, 256)
F = trnfft.fft2(image)
# STFT (matches torch.stft signature)
waveform = torch.randn(16000)
S = trnfft.stft(waveform, n_fft=512, hop_length=256)
Complex Neural Network Layers
from trnfft import ComplexTensor
from trnfft.nn import ComplexLinear, ComplexConv1d, ComplexModReLU
# Build complex-valued models for speech/audio/physics
x = ComplexTensor(real_part, imag_part)
layer = ComplexLinear(256, 128)
y = layer(x)
Architecture
+--------------------------------------------+
| User Code / Model |
+--------------------------------------------+
| trnfft.api (torch.fft API) |
| fft() ifft() rfft() stft() fft2() |
+--------------------------------------------+
| trnfft.fft_core | trnfft.nn |
| Cooley-Tukey | ComplexLinear |
| Bluestein | ComplexConv1d |
| Plan caching | ComplexModReLU |
+------------------------+-------------------+
| trnfft.nki.dispatch |
| "auto" | "pytorch" | "nki" |
+--------------------------------------------+
| PyTorch ops | NKI kernels |
| (any device) | (Trainium only) |
| torch.matmul | nisa.nc_matmul |
| element-wise | Tensor Engine |
| | Vector Engine |
| | SBUF ↔ PSUM pipeline |
+------------------+------------------------+
How It Works
No complex dtype? Trainium's NKI doesn't support complex64/complex128. ComplexTensor stores complex values as paired real tensors and decomposes complex arithmetic into real-valued operations.
FFT → butterflies → matmul. Each Cooley-Tukey butterfly stage performs complex-multiply-and-add across all groups simultaneously. On NKI, the complex multiply maps to the Tensor Engine (systolic array).
Algorithms:
- Power-of-2: Cooley-Tukey radix-2 (iterative, decimation-in-time)
- Arbitrary sizes: Bluestein's chirp-z transform (pads to power-of-2)
NKI complex GEMM uses stationary tile reuse (2 SBUF loads instead of 8) and PSUM accumulation, overlapping Vector Engine negation with Tensor Engine matmul.
Hardware compatibility
NKI kernels are validated against Neuron SDK 2.24+ on the Deep Learning AMI Neuron PyTorch 2.9 (Ubuntu 24.04) AMI (20260410 or later). See docs/installation.md for the full compatibility matrix.
Benchmarks
NKI vs PyTorch on the same Trainium instance — see the benchmarks page for the latest numbers.
Status
v0.10.0 — NKI kernels validated on trn1.2xlarge. For STFT and batched FFT, set_backend("nki") beats vanilla torch.fft.fft. See benchmarks for the full picture.
API coverage (13 common torch.fft functions):
fft, ifft, rfft, irfft, fft2, rfft2, irfft2, fftn, ifftn, rfftn, irfftn, stft, istft.
Not implemented: hfft, ihfft — Hermitian-input variants; rare in practice. File an issue if you need them.
Roadmap
- NKI
ComplexConv1d/ComplexModReLUkernels (today both fall back to PyTorch on NKI) - BF16 / FP16 support across NKI kernels
- Multi-NeuronCore parallelism (scaffold in
trnfft/nki/multicore.py) - SBUF-resident dispatch to reduce small-op overhead
Related projects in the trnsci suite
All six siblings are on PyPI, along with the umbrella meta-package:
| Project | What | Latest |
|---|---|---|
| trnsci | Umbrella meta-package pulling the whole suite | v0.1.0 |
| trnblas | BLAS Level 1–3 for Trainium | v0.4.0 |
| trnrand | Philox / Sobol / Halton random number generation | v0.1.0 |
| trnsolver | Linear solvers (CG, GMRES) and eigendecomposition | v0.3.0 |
| trnsparse | Sparse matrix operations | v0.1.1 |
| trntensor | Tensor contractions (einsum, TT/Tucker decompositions) | v0.1.1 |
| neuron-complex-ops | Original proof-of-concept, folded into trnfft | archived |
License
Apache 2.0 — Copyright 2026 Scott Friedman
Acknowledgments
Built on insights from:
- tcFFT — Tensor Core FFT research
- FFTW — Plan-based FFT architecture
- AWS NKI documentation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file trnfft-0.10.0.tar.gz.
File metadata
- Download URL: trnfft-0.10.0.tar.gz
- Upload date:
- Size: 80.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d60b4ccb5a9b3a494dc68ae8e0bd3687bcec2d562909778a12ffb77cd7756291
|
|
| MD5 |
920fb5766e5c4d8bf006648148f7de35
|
|
| BLAKE2b-256 |
e9d94e79b3d20986104f610b908a2a6f832942bd31c8f241b10f920872938380
|
Provenance
The following attestation bundles were made for trnfft-0.10.0.tar.gz:
Publisher:
publish.yml on trnsci/trnfft
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
trnfft-0.10.0.tar.gz -
Subject digest:
d60b4ccb5a9b3a494dc68ae8e0bd3687bcec2d562909778a12ffb77cd7756291 - Sigstore transparency entry: 1282726612
- Sigstore integration time:
-
Permalink:
trnsci/trnfft@fcb7636ff5880570174b49cd075f968c89469fc3 -
Branch / Tag:
refs/tags/v0.10.0 - Owner: https://github.com/trnsci
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fcb7636ff5880570174b49cd075f968c89469fc3 -
Trigger Event:
release
-
Statement type:
File details
Details for the file trnfft-0.10.0-py3-none-any.whl.
File metadata
- Download URL: trnfft-0.10.0-py3-none-any.whl
- Upload date:
- Size: 27.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dbf80bd4e9c5e846aa56058d756a4b7a029cc4c778009643ab0efbdd373845c2
|
|
| MD5 |
ba3d0d19cbdc9f664b808979b4c1d3ab
|
|
| BLAKE2b-256 |
0689f472e45d0754a4d7da33d59f03431e3a877f53221da0f5ed2d846e8e0c52
|
Provenance
The following attestation bundles were made for trnfft-0.10.0-py3-none-any.whl:
Publisher:
publish.yml on trnsci/trnfft
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
trnfft-0.10.0-py3-none-any.whl -
Subject digest:
dbf80bd4e9c5e846aa56058d756a4b7a029cc4c778009643ab0efbdd373845c2 - Sigstore transparency entry: 1282726633
- Sigstore integration time:
-
Permalink:
trnsci/trnfft@fcb7636ff5880570174b49cd075f968c89469fc3 -
Branch / Tag:
refs/tags/v0.10.0 - Owner: https://github.com/trnsci
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fcb7636ff5880570174b49cd075f968c89469fc3 -
Trigger Event:
release
-
Statement type: