Pure-Rust GPU compute substrate with Python bindings. cuda-oxide-compiled Stockham FFT.
Project description
ferrum-gpu
Pure-Rust GPU compute substrate with Python bindings. FFT kernels run on NVIDIA GPUs today via cuda-oxide (Rust source compiled to PTX, no CUDA C). Cross-vendor support via spirv-oxide → Vulkan is the v0.2 roadmap.
This is v0.1.0. The workspace ships:
ferrum-gpu-core:Backendtrait,KernelArtifact, errors.no_std + alloc.ferrum-gpu-cuda:impl Backend for Cudaovercudarc0.19.ferrum-gpu: facade withDevice<B>andBuffer<T, B>.ferrum-gpu-fft: 1D + 2D radix-2 power-of-2 C2C FFT host scaffolding + CPU Stockham reference.ferrum-gpu-py: Python bindings via PyO3 + maturin.ferrum_gpu.cuda.Device(0)persistent handle +ferrum_gpu.fft.fft_1d_c2c_pow2+ferrum_gpu.fft.fft_2d_c2c_pow2.ferrum-gpu-bench: cuFFT comparison binary (1D, batched).examples/vector-add: end-to-end demo using hand-written PTX through the substrate.examples/vector-add-cuda-oxide: same kernel in Rust, compiled to PTX by cuda-oxide.examples/fft-1d-c2c: 1D Stockham FFT in Rust, GPU-vs-CPU on 8 cases (N from 4 to 4096, batched, forward + inverse).
29 GPU pytest cases verified end-to-end against numpy.fft.fft / numpy.fft.fft2 (1D: 16 cases, 2D: 13 cases) within 1e-3 to 1e-4 relative error.
Requirements
- Linux x86_64
- CUDA Toolkit 13.x
- NVIDIA driver compatible with the installed Toolkit
- Rust nightly
2026-04-03(pinned viarust-toolchain.toml) cargo-oxide:cargo install --git https://github.com/NVlabs/cuda-oxide.git cargo-oxide- For the Python bindings: Python 3.10+ with maturin + numpy + pytest
Quick start: vector-add via hand-written PTX
git clone https://github.com/alejandro-soto-franco/ferrum-gpu
cd ferrum-gpu
make example-vector-add
Expected:
vector_add: 1048576 elements verified
Quick start: vector-add via Rust source + cuda-oxide
cargo install --git https://github.com/NVlabs/cuda-oxide.git cargo-oxide
cargo oxide doctor # one-time codegen-backend bootstrap
make example-vector-add-oxide
Expected:
vector_add (cuda-oxide): 1048576 elements verified
Quick start: 1D Stockham FFT
make example-fft
Runs 8 cases (N=4 through N=4096, batched, forward + inverse), each verified against a CPU Stockham reference within 1e-4 relative error.
Quick start: Python
uv is the recommended Python package manager;
the Makefile targets and the wheel install path work the same on pip for users
who prefer it.
uv venv ~/.venvs/ferrum-gpu
source ~/.venvs/ferrum-gpu/bin/activate
uv pip install maturin pytest numpy
make develop # builds the cdylib + installs into the venv
python3 -c "
import numpy as np, ferrum_gpu as fg
arr = np.array([1+0j, 2+0j, 3+0j, 4+0j], dtype=np.complex64)
print(fg.fft.fft_1d_c2c_pow2(arr, log_n=2))
"
Pip equivalent:
python3 -m venv ~/.venvs/ferrum-gpu
source ~/.venvs/ferrum-gpu/bin/activate
pip install maturin pytest numpy
make develop
Run the pytest matrix:
make pytest
29 cases (16 1D + 13 2D), each compared against numpy.fft within 1e-3 to 1e-4 relative error.
Performance
make bench runs ferrum-gpu-bench, which times the in-tree
cuda-oxide-compiled Stockham radix-2 power-of-2 C2C kernel against cuFFT
(via cudarc 0.19's cufft feature) for batched 1D transforms at
N in {256, 1024, 4096}, batch = 256, 100 trials per size + 10-trial
warmup. Per-batch microseconds, measured on an RTX 5060 Laptop (sm_120):
| N | ferrum_us | cufft_us | ratio |
|---|---|---|---|
| 256 | 0.089 | 0.016 | 5.52 |
| 1024 | 0.162 | 0.059 | 2.72 |
| 4096 | 0.548 | 0.080 | 6.86 |
The Stockham kernel is a single-block-per-FFT reference implementation with no radix-4 or warp-specialised stages, so cuFFT's vendor-tuned plan wins outright at these sizes. Closing the gap is on the v0.2 roadmap.
Testing
CPU-only tests: make test.
GPU tests + all examples + pytest (requires CUDA + NVIDIA GPU): make verify-all.
Publishing (PyPI wheel)
The public wheel is built inside a manylinux_2_28_x86_64 Docker image
that ships CUDA Toolkit 13.x, the cuda-oxide-pinned Rust nightly, and
maturin. The container is ~6-8 GB and takes ~15-25 minutes to build the
first time.
make wheel-manylinux # builds dist/ferrum_gpu-*-manylinux_2_28_x86_64.whl
auditwheel show dist/*.whl # verify the manylinux tag
Publishing to PyPI is operator-driven (no CI):
# TestPyPI first
twine upload --repository testpypi dist/*.whl
# PyPI (requires a token in ~/.pypirc)
twine upload dist/*.whl
The local-build path (make develop + make wheel) produces a wheel
tagged linux_x86_64 (not manylinux). Useful for local testing only.
License
Apache-2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ferrum_gpu-0.1.0-cp310-abi3-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: ferrum_gpu-0.1.0-cp310-abi3-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 360.6 kB
- Tags: CPython 3.10+, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e55e80f7dccadf26ce6d1fa0565a5165590ba0d7185d285b1c2cefa50377ce5
|
|
| MD5 |
7002c72f19bd754a82fe662e34f2f584
|
|
| BLAKE2b-256 |
6d07e320ccf59d145569f6c9fb354d700b2e8eb0ac4def7e7cf3f6099d6ad776
|
Provenance
The following attestation bundles were made for ferrum_gpu-0.1.0-cp310-abi3-manylinux_2_34_x86_64.whl:
Publisher:
release.yml on alejandro-soto-franco/ferrum-gpu
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ferrum_gpu-0.1.0-cp310-abi3-manylinux_2_34_x86_64.whl -
Subject digest:
3e55e80f7dccadf26ce6d1fa0565a5165590ba0d7185d285b1c2cefa50377ce5 - Sigstore transparency entry: 1648646812
- Sigstore integration time:
-
Permalink:
alejandro-soto-franco/ferrum-gpu@41625bee94c5178bb18d58b98d1975b539ef905a -
Branch / Tag:
refs/heads/main - Owner: https://github.com/alejandro-soto-franco
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@41625bee94c5178bb18d58b98d1975b539ef905a -
Trigger Event:
workflow_dispatch
-
Statement type: