Experimental PyTorch-like autograd engine with an optional Vulkan compute backend (Raspberry Pi 5-focused).
Project description
🍓🎇 rasptorch
rasptorch is an experimental deep learning library inspired by PyTorch, built with a singular focus: making complex neural networks practical and efficient to run on resource-constrained hardware like the Raspberry Pi 5, by leveraging its GPU capabilities via Vulkan.
✨ Core Concepts
The library operates on a multi-layered architecture to maximize hardware utilization:
- CPU Backend (Software): Uses a pure NumPy-backed autograd engine and
nnmodule for reliable computation when GPU acceleration is unavailable. - GPU Backend (Hardware): Features an experimental Vulkan backend for high-speed tensor operations (elementwise math, matmul, reductions) directly on the Pi 5's GPU.
- Interface: Provides a streamlined CLI/Streamlit UI for interactive model building, training, persistence, and inspection.
The Vulkan path relies on real compute shaders compiled to SPIR-V, giving deep control over the underlying hardware.
🔌 Backend Abstraction (Connectable Backends)
rasptorch now exposes a backend abstraction API so compute backends can be registered and connected at runtime:
import rasptorch
# Inspect availability
print(rasptorch.available_backends()) # {'cpu': True, 'vulkan': ..., 'opencl': ..., 'cuda': ...}
# Try to connect a backend (falls back to CPU in non-strict mode)
active = rasptorch.connect_backend("vulkan", strict=False)
print(active.name)
Built-in backend adapters:
numpy(NumPy adapter; internal key:cpu) - Pure NumPy autogradvulkan(rasptorch Vulkan kernels, with optional CPU fallback) - Optimized for Raspberry Pi 4/5 ⚡opencl(pyopencl when available, optional CPU fallback)cuda(CuPy when available, with PyTorch CUDA fallback, optional CPU fallback)
CLI helpers:
rasptorch backend list
rasptorch backend connect numpy
rasptorch backend connect vulkan --strict
# Benchmark with auto-tuned Vulkan kernel and submission batching
rasptorch --json backend benchmark --backends numpy,vulkan,cuda --size 2048 --iterations 100 --warmup 20 --vulkan-kernel auto --vulkan-autotune-submit --seed 42
Note: User-facing CLI/UI labels the CPU backend as
numpy. Vulkan benchmark mode uses resident buffers (upload once, repeated on-device matmul, download once). Performance (Optimized): Vulkan achieves ~564 GFLOPS (78% of NumPy on matmul_vec4 with auto-tuning).--vulkan-kernel autoprobesmatmul,matmul_vec4,matmul_a_bt, andmatmul_a_bt_tiled(when available) and keeps the faster path. If Vulkan hitsVkErrorDeviceLost, lower--vulkan-submit-every(for example,4or1) or use auto-tuning. Recommended: Use--vulkan-autotune-submitto jointly probe kernel + submit chunk and pick the fastest stable combo. Optimizations: Command buffer batching, memory-mapped buffers, auto kernel selection.
📚 What's Included (Core Features)
- Tensor Operations: Support for elementwise math, matrix multiplication (
matmul), reductions, indexing, reshaping, stacking, and broadcasting. - Layers: Includes standard neural network blocks:
Linear,MLP,CNN,GRU,Transformer, normalization layers, activations, pooling, embeddings, and attention. - Training Tools: Full suite of tools including optimizers (
SGD), learning-rate schedulers, gradient clipping, and regularization helpers. - Persistence: Ability to save and load checkpoint weights without needing the full
torchdependency. - Interfaces: CLI (
rasptorch chat) and Streamlit UI (rasptorch ui).
🚀 Getting Started
1. Installation
A. Basic Install (CPU Only): To get the core library components running on the CPU:
pip install rasptorch
B. Development Install (Full Capability): For local development and access to all potential backends:
pip install -e ".[dev]"
C. GPU Mode Prerequisites: To utilize the GPU backend, you must meet these prerequisites:
- Raspberry Pi 5 with working Vulkan drivers.
- The
glslcshader compiler must be available in your systemPATH. - When running, you must specify the device:
--device gpuor--device auto.
2. Quick Run Examples
Start the Interactive Shell:
uv run rasptorch chat
Launch the Web UI:
uv run rasptorch ui
(This will usually open at http://localhost:8501)
Viewing Help: To see all available CLI subcommands:
uv run rasptorch --help
⚙️ Execution Modes & Workflows
The main.py script controls the operational mode:
cpu: Pure NumPy autograd execution on the CPU.gpu: Executes the training loop explicitly using the Vulkan backend kernels.gpu-autograd: An experimental mode for tracing gradients across the GPU pipeline.
Example Training Command:
uv run main.py --device gpu --epochs 50 --batch-size 32 --lr 0.01
📊 Benchmarks
rasptorch provides a built-in benchmark tool for comparing backend performance on matrix multiplication:
Quick Benchmark (Single Size):
# Benchmark with default settings (2048x2048 matmul, 100 iterations)
uv run rasptorch backend benchmark
# Benchmark with custom size and multiple backends
uv run rasptorch --json backend benchmark --backends numpy,vulkan,cuda --size 2048 --iterations 100 --warmup 20 --seed 42
Performance Results (Raspberry Pi 5, 2048x2048 matmul, optimized):
| Backend | Time (s) | Iterations/s | GFLOPS | Status |
|---|---|---|---|---|
| NumPy | 2.25 | 44.4 | 763 | Reference |
| Vulkan (auto-tuned) | 3.15 | 31.8 | 546 | ⚡ GPU |
| CUDA (when available) | 0.56 | 178 | 3059 | Best |
Vulkan Kernel Selection:
The --vulkan-kernel auto flag intelligently probes available kernels:
matmul- Basic single-threaded implementationmatmul_vec4- SIMD-style vec4 operationsmatmul_a_bt- Matrix transpose optimization (for A @ B.T)matmul_a_bt_tiled- Tiled transpose optimization (fastest when applicable)
Advanced Tuning:
# Auto-tune both kernel AND submission batching strategy
uv run rasptorch --json backend benchmark --backends vulkan --size 2048 \
--iterations 100 --warmup 20 \
--vulkan-kernel auto \
--vulkan-autotune-submit \
--seed 42
# Manual kernel selection with custom batch submission
uv run rasptorch --json backend benchmark --backends vulkan \
--vulkan-kernel matmul_a_bt_tiled \
--vulkan-submit-every 4 \
--size 2048 --iterations 100
Output Format:
Results are provided in JSON format (with --json flag) including:
status: "ok" or "unavailable"elapsed_seconds: Total benchmark timeiterations_per_second: Throughput metricestimated_gflops: Floating-point performancechecksum: Verification resultkernel: Selected kernel name (for auto mode)submit_every: Submission batch size (for Vulkan)
Optimization Tips:
- Use
--vulkan-autotune-submitfor best results (probes kernel + batch combinations) - If you see
VkErrorDeviceLost, reduce--vulkan-submit-every(try4or1) - Larger problem sizes better amortize GPU setup overhead
- Command buffer batching (
--vulkan-submit-every) balances latency and throughput
For detailed optimization guide, see VULKAN_OPTIMIZATION.md.
🧠 Advanced Topics
1. Tensor Operations
Basic tensor math is performed via:
# Create tensors
uv run rasptorch tensor random --shape 2,3,4
uv run rasptorch tensor ones --shape 5,10
The results show the low-level tensor capabilities.
2. Model Definition
Models are defined using structured commands:
# Simple MLP
uv run rasptorch model mlp --layers "64,32,16,2"
# Complex CNN
uv run rasptorch model cnn --in-channels 3 --out-channels "32,64,128"
Managing the lifecycle:
uv run rasptorch model list
uv run rasptorch model save --model-id <id> --path model.pth
🩹 Troubleshooting & Best Practices
- Performance: The fastest paths are those that keep the computation entirely on the GPU and minimize data transfer across the PCIe bus.
- Fallback: If GPU operations fail due to driver issues, the system gracefully falls back to the CPU NumPy path, but performance will suffer.
- Advanced Use: For understanding the deep dive into custom kernel optimization, please refer to the source code in the
rasptorch/gpu_demo.pyandrasptorch/main.pyscripts.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rasptorch-3.4.0.tar.gz.
File metadata
- Download URL: rasptorch-3.4.0.tar.gz
- Upload date:
- Size: 205.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37a925dc1af2187a53e85c37db9413e05562f149143ce85bcfa5cffabde86b32
|
|
| MD5 |
337717a3f569a19b264f1d3cc27e695e
|
|
| BLAKE2b-256 |
e64a25baa0e3da92d6aa5739a974c516cbeed4b3a645176efda9566301afbd1d
|
File details
Details for the file rasptorch-3.4.0-py3-none-any.whl.
File metadata
- Download URL: rasptorch-3.4.0-py3-none-any.whl
- Upload date:
- Size: 236.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
72a9a0a3307d9ffac45b38ed001e622c0880b89ea7d172aec8309245f59e66ac
|
|
| MD5 |
632dd751b5ec74b143704c89dd9145f2
|
|
| BLAKE2b-256 |
2e0203ef1c9f75b5a56d44c550f2959b2cbd1142398352166e12b9a5788b339c
|