Skip to main content

Experimental PyTorch-like autograd engine with an optional Vulkan compute backend (Raspberry Pi 5-focused).

Reason this release was yanked:

Outdated

Project description

rasptorch

rasptorch is an experimental deep learning library inspired by PyTorch, with a specific goal: make training and running neural networks practical on the Raspberry Pi 5 while taking advantage of its GPU.

The project has two main parts:

  • A small, NumPy-backed autograd engine and nn module that runs on the Raspberry Pi CPU.
  • An experimental Vulkan-based backend, wired through a device API (Tensor(..., device="cpu"|"gpu").to("gpu")), meant to offload core tensor operations (elementwise ops, matmul, activations, reductions, etc.) to the Pi 5's GPU via Vulkan compute.

The Vulkan backend is implemented with real Vulkan compute shaders (GLSL compiled to SPIR-V). It supports a small but useful set of kernels (see rasptorch/shaders/ for the authoritative list).

High-level highlights:

  • Elementwise ops: +, *, -, neg, relu (plus scalar variants)
  • Matmul: @ (tiled/shared-memory shader)
  • Reductions: sum, mean
  • Common broadcast forms: (N,M) + (M,) and (N,M) * (M,)
  • Loss: functional.cross_entropy (GPU kernel)
  • NN essentials: GPU row-wise softmax / log_softmax, and 2D LayerNorm

Performance notes:

  • The fastest path is compute-only: keep tensors on GPU and avoid per-iteration .numpy() / readbacks.
  • Fusing ops and reusing output buffers can make the Vulkan path faster than NumPy for certain workloads on Raspberry Pi 5.

See main.py for a simple training example and gpu_demo.py for a focused correctness + benchmark suite for the Vulkan backend.

Demos

  • Essentials demo (softmax/log_softmax, LayerNorm, Dropout, no_grad, detach):
    • CPU: uv run essentials_demo.py --device cpu
    • GPU-autograd (Vulkan): uv run essentials_demo.py --device gpu

Note: the Vulkan-backed GPU path requires working Vulkan drivers and glslc (shader compiler) on your PATH. If Vulkan or glslc is unavailable, --device gpu falls back to the NumPy backend and prints the reason (for example: glslc not found).

Modes

There are three execution modes exposed via main.py --device ...:

  • cpu: NumPy autograd engine (PyTorch-like, runs on CPU)
  • gpu: explicit Vulkan training path (forward + backward + SGD via purpose-built kernels)
  • gpu-autograd: experimental GPU autograd (builds a graph on GPU for a limited set of ops)

Quickstart

  • Use a virtual environment for best results (e.g. .venv, .venv + uv, .venv + poetry).

Installation

From PyPI (CPU-only):

  • pip install rasptorch

GPU (Pi 5 Vulkan):

  • pip install "rasptorch[gpu]"

Optional (for saving/loading .pth via real torch.save/torch.load):

  • pip install "rasptorch[torch]"

Dev/test:

  • pip install -e ".[dev]"

Notes for GPU mode:

  • Requires working Vulkan drivers on your system.
  • Requires glslc (shader compiler) available on PATH.

Quick GPU validation:

  • uv run gpu_demo.py --smoke-only
    • Initializes Vulkan strictly and runs fast correctness checks for core kernels.
    • If this fails, uv run main.py --device gpu will also fail.

Quick model saving check:

  • uv run main.py --device cpu --epochs 1 --save model.pth
    • If torch is installed: python -c "import torch; print(torch.load('model.pth').keys())"
    • If not: python -c "import pickle; print(pickle.load(open('model.pth','rb')).keys())"

For local development from this repo:

  • pip install -e .

GPU Training (Vulkan)

There are currently two “modes” of training in this repo:

  • CPU autograd training (PyTorch-like): uses the NumPy-backed autograd engine.
  • Vulkan GPU training (explicit kernels): runs forward + backward + SGD updates on GPU using purpose-built compute shaders.

The Vulkan training path lives in rasptorch/gpu_training.py and currently supports a 2-layer MLP:

Linear -> ReLU -> Linear with MSE loss and SGD.

Run it via:

  • uv run main.py --device gpu --epochs 50 --batch-size 32 --lr 0.1

Saving weights (PyTorch-style .pth):

  • uv run main.py --device gpu --epochs 50 --save model.pth
  • If torch is installed, this is a real torch.save(...) file loadable via torch.load("model.pth").
  • If torch is not installed, rasptorch falls back to writing a pickle payload (same keys, not torch.load compatible).

GPU Autograd (WIP)

There is now an experimental gpu-autograd mode that enables loss.backward() even when the model and activations live on GPU, for a limited set of ops.

Run it via:

  • uv run main.py --device gpu-autograd --epochs 50 --batch-size 32 --lr 0.1

Currently supported (GPU) in autograd:

  • +, *, - (scalar and tensor forms), @ (matmul)
  • scalar ops: tensor + s, tensor * s, tensor / s, plus s + tensor, s * tensor, s - tensor
  • neg, relu, sum, mean, T (2D transpose)
  • functional.softmax / functional.log_softmax (2D row-wise, dim=-1/1)
  • nn.LayerNorm (2D inputs, 1D normalized_shape, eps=1e-5)
  • Linear backward (GPU grads for weight/bias)
  • SGD.step() updates GPU parameters in-place (SGD + optional momentum/weight decay)
  • functional.cross_entropy(logits, target_onehot) (softmax cross-entropy, mean reduction)

Tip: rasptorch.no_grad() exists (like PyTorch) to disable graph building during evaluation.

Training Loop Utilities

There is now a small, reusable training loop helper in rasptorch.train that provides PyTorch-like epoch logs (loss, accuracy/metrics, throughput) for any model.

Key pieces:

  • rasptorch.train.fit(...): train loop with optional validation
  • rasptorch.train.Accuracy(): top-1 classification accuracy
  • rasptorch.train.classification_target_one_hot(C, device=...): converts integer labels -> one-hot

Example (classifier):

from rasptorch import functional as F
from rasptorch.train import fit, Accuracy, classification_target_one_hot
from rasptorch.optim import SGD

model = ...
opt = SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay=1e-4)

fit(
    model,
    opt,
    train_loader,
    loss_fn=F.cross_entropy,
    device="gpu",
    epochs=10,
    val_loader=val_loader,
    target_transform=classification_target_one_hot(num_classes=10, device="gpu"),
    metrics=[Accuracy()],
)

Notes:

  • Metrics like accuracy call .numpy() on logits, which triggers a GPU readback.
  • rasptorch.no_grad() exists; evaluation can avoid building graphs.
  • mse_loss is now implemented purely via tensor ops ((pred-target)^2 + mean()), so the loss tensor itself is on GPU in gpu-autograd mode; training code typically reads it back via .numpy() for logging.
  • Parameters and gradients stay on GPU; loss is read back to CPU for logging.
  • uv run main.py --device gpu now requires Vulkan. If Vulkan init or shader compilation fails, it raises a clear error instead of silently falling back.
  • Broadcasting is still limited; common 2D + 1D row-vector forms like (N,M) + (M,) and (N,M) * (M,) are supported.

Benchmarks

gpu_demo.py prints timing stats (min/p50/p95/mean/std) for:

  • CPU (NumPy)
  • GPU compute+readback (includes .numpy() every iteration)
  • GPU compute-only (no per-iteration readback)
  • GPU fused compute-only and no-alloc variants (preallocated output buffers)

If you want the GPU to win, focus on the compute-only + fused/no-alloc numbers.

Current Limitations

  • GPU autograd is still incomplete (only a subset of ops are supported).
  • GPU reductions now support sum() and mean(), but other reductions/broadcast patterns are still limited.
  • The Vulkan backend only implements a small set of ops/kernels; expanding model coverage will require more kernels (and ideally more fusion).
  • PyTorch integration is experimental: rasptorch.torch_bridge currently supports a small inference subset (Conv2d/Linear/ReLU) and may copy tensors CPU<->GPU.

Development & Tests

  • pytest runs CPU tests by default.
  • The backend smoke test runs everywhere:
    • With Vulkan available, it exercises real GPU kernels.
    • Without Vulkan, it exercises the NumPy fallback path.
  • For a strict Vulkan-only check, run uv run gpu_demo.py --smoke-only.

Publishing (maintainers)

Build:

  • python -m pip install -U build twine
  • python -m build

Upload PyPI:

  • python -m twine upload dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rasptorch-1.2.0.tar.gz (82.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rasptorch-1.2.0-py3-none-any.whl (100.7 kB view details)

Uploaded Python 3

File details

Details for the file rasptorch-1.2.0.tar.gz.

File metadata

  • Download URL: rasptorch-1.2.0.tar.gz
  • Upload date:
  • Size: 82.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for rasptorch-1.2.0.tar.gz
Algorithm Hash digest
SHA256 52fa4bf217836de6fb25a21d3075bd7b2a1d1933d6783433c4ee0c77f6abc2b8
MD5 af5936cdbf02f342f632ff0f66a8e025
BLAKE2b-256 8fe0f98933b95dbc460e1dbb47a52c46b4e1828c78bb07d0bce31cc06eb5d368

See more details on using hashes here.

File details

Details for the file rasptorch-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: rasptorch-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 100.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for rasptorch-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9a50618ac324363cc9b5b408657800b1a586bf2514f097e5fc96cd04432c21a5
MD5 07445f56284de06383642def9f05ed42
BLAKE2b-256 4bbb1119357620d9d52cd0ec7ed29742409ce3e6db5c6a72cef8411109c92535

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page