rasptorch

Experimental PyTorch-like autograd engine with an optional Vulkan compute backend (Raspberry Pi 5-focused).

These details have not been verified by PyPI

Project links

Project description

rasptorch

rasptorch is an experimental deep learning library inspired by PyTorch, with a specific goal: make training and running neural networks practical on the Raspberry Pi 5 while taking advantage of its GPU.

The project has two main parts:

A small, NumPy-backed autograd engine and nn module that runs on the Raspberry Pi CPU.
An experimental Vulkan-based backend, wired through a device API (Tensor(..., device="cpu"|"gpu").to("gpu")), meant to offload core tensor operations (elementwise ops, matmul, activations) to the Pi 5's GPU via Vulkan compute.

The Vulkan backend is implemented with real Vulkan compute shaders (GLSL compiled to SPIR-V). It supports a small but useful set of kernels:

Elementwise: add, mul, relu
Matmul: matmul (tiled/shared-memory shader)
Fused kernel: mul_add_relu for (x * y + x).relu()

Performance notes:

The fastest path is compute-only (keep tensors on GPU, avoid per-iteration .numpy()/readbacks).
Fusing ops and reusing output buffers can make the Vulkan path faster than NumPy for certain workloads on Raspberry Pi 5.

See main.py for a simple training example and gpu_demo.py for a focused test + benchmark suite for the Vulkan backend.

Quickstart

Ensure you have a python virtual enviroment for best results, e.g: .venv, .venv + uv, .venv + poetry, etc.

Installation

From PyPI:

pip install rasptorch

Notes for GPU mode:

Requires working Vulkan drivers on your system.
Requires glslc (shader compiler) available on PATH.

For local development from this repo:

pip install -e .

GPU Training (Vulkan)

There are currently two “modes” of training in this repo:

CPU autograd training (PyTorch-like): uses the NumPy-backed autograd engine.
Vulkan GPU training (explicit kernels): runs forward + backward + SGD updates on GPU using purpose-built compute shaders.

The Vulkan training path lives in rasptorch/gpu_training.py and currently supports a 2-layer MLP:

Linear -> ReLU -> Linear with MSE loss and SGD.

Run it via:

uv run main.py --device gpu --epochs 50 --batch-size 32 --lr 0.1

GPU Autograd (WIP)

There is now an experimental gpu-autograd mode that enables loss.backward() even when the model and activations live on GPU, for a limited set of ops.

Run it via:

uv run main.py --device gpu-autograd --epochs 50 --batch-size 32 --lr 0.1

Currently supported (GPU) in autograd:

+, *, - (scalar and tensor forms), @ (matmul)
scalar ops: tensor + s, tensor * s, tensor / s, plus s + tensor, s * tensor, s - tensor
neg, relu, sum, mean, T (2D transpose)
Linear backward (GPU grads for weight/bias)
SGD.step() updates GPU parameters in-place (SGD + optional momentum/weight decay)
functional.cross_entropy(logits, target_onehot) (softmax cross-entropy, mean reduction)

Training Loop Utilities

There is now a small, reusable training loop helper in rasptorch.train that provides PyTorch-like epoch logs (loss, accuracy/metrics, throughput) for any model.

Key pieces:

rasptorch.train.fit(...): train loop with optional validation
rasptorch.train.Accuracy(): top-1 classification accuracy
rasptorch.train.classification_target_one_hot(C, device=...): converts integer labels -> one-hot

Example (classifier):

from rasptorch import functional as F
from rasptorch.train import fit, Accuracy, classification_target_one_hot
from rasptorch.optim import SGD

model = ...
opt = SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay=1e-4)

fit(
	model,
	opt,
	train_loader,
	loss_fn=F.cross_entropy,
	device="gpu",
	epochs=10,
	val_loader=val_loader,
	target_transform=classification_target_one_hot(num_classes=10, device="gpu"),
	metrics=[Accuracy()],
)

Notes:

Metrics like accuracy call .numpy() on logits, which triggers a GPU readback.
There is not yet a no_grad() context; evaluation still builds graphs.
mse_loss is now implemented purely via tensor ops ((pred-target)^2 + mean()), so the loss tensor itself is on GPU in gpu-autograd mode; training code typically reads it back via .numpy() for logging.
Parameters and gradients stay on GPU; loss is read back to CPU for logging.
uv run main.py --device gpu now requires Vulkan. If Vulkan init or shader compilation fails, it raises a clear error instead of silently falling back.
Broadcasting is still limited; common 2D + 1D row-vector forms like (N,M) + (M,) and (N,M) * (M,) are supported.

Benchmarks

gpu_demo.py prints timing stats (min/p50/p95/mean/std) for:

CPU (NumPy)
GPU compute+readback (includes .numpy() every iteration)
GPU compute-only (no per-iteration readback)
GPU fused compute-only and no-alloc variants (preallocated output buffers)

If you want the GPU to win, focus on the compute-only + fused/no-alloc numbers.

Current Limitations

GPU autograd is still incomplete (only a subset of ops are supported).
GPU reductions now support sum() and mean(), but other reductions/broadcast patterns are still limited.
The Vulkan backend only implements a small set of ops/kernels; expanding model coverage will require more kernels (and ideally more fusion).
PyTorch integration is experimental: rasptorch.torch_bridge currently supports a small inference subset (Conv2d/Linear/ReLU) and may copy tensors CPU<->GPU.

Publishing (maintainers)

Build:

python -m pip install -U build twine
python -m build

Upload PyPI:

python -m twine upload dist/*

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

3.3.1

Apr 11, 2026

3.2.1

Apr 2, 2026

3.2.0 yanked

Apr 2, 2026

Reason this release was yanked:

Bugs

3.1.0

Mar 30, 2026

3.0.0

Mar 28, 2026

2.0.5

Mar 14, 2026

2.0.4 yanked

Mar 14, 2026

Reason this release was yanked:

Bugs

2.0.2 yanked

Mar 13, 2026

Reason this release was yanked:

Bugs

1.4.0

Mar 11, 2026

1.3.2

Mar 8, 2026

1.3.1 yanked

Mar 8, 2026

Reason this release was yanked:

Outdated

1.3.0 yanked

Mar 8, 2026

Reason this release was yanked:

Outdated

1.2.0 yanked

Feb 15, 2026

Reason this release was yanked:

Outdated

1.1.0 yanked

Feb 15, 2026

Reason this release was yanked:

Outdated

This version

0.1.0 yanked

Feb 15, 2026

Reason this release was yanked:

Too Outdated/Major Bugs

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rasptorch-0.1.0.tar.gz (57.6 kB view details)

Uploaded Feb 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rasptorch-0.1.0-py3-none-any.whl (81.5 kB view details)

Uploaded Feb 15, 2026 Python 3

File details

Details for the file rasptorch-0.1.0.tar.gz.

File metadata

Download URL: rasptorch-0.1.0.tar.gz
Upload date: Feb 15, 2026
Size: 57.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for rasptorch-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`888a2350d440240a5f9a4aeadd79c73998ee444ddadd6cbf907532b79e5f6779`
MD5	`4745fb8642493fcf0b4fe603868299d1`
BLAKE2b-256	`598154c8dd7b382dc18c2030b1554a935761c1a2383edc9f30e9d0fc6a133ce3`

See more details on using hashes here.

File details

Details for the file rasptorch-0.1.0-py3-none-any.whl.

File metadata

Download URL: rasptorch-0.1.0-py3-none-any.whl
Upload date: Feb 15, 2026
Size: 81.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for rasptorch-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3c75f2f9b912faa170cd8ef67adfe60f8c396358795794c840f1b0112b33374e`
MD5	`6c4a0faf69c4145d9696d5142473fe70`
BLAKE2b-256	`a216fa7a05058e6f89195a4aec92e12b0e11949ff994dabf793d67379bf2fe59`

See more details on using hashes here.

rasptorch 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

rasptorch

Quickstart

Installation

GPU Training (Vulkan)

GPU Autograd (WIP)

Training Loop Utilities

Benchmarks

Current Limitations

Publishing (maintainers)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes