Experimental PyTorch-like autograd engine with an optional Vulkan compute backend (Raspberry Pi 5-focused).

These details have not been verified by PyPI

Project links

Project description

🍓🎇 rasptorch

rasptorch is an experimental deep learning library inspired by PyTorch, built with a singular focus: making complex neural networks practical and efficient to run on resource-constrained hardware like the Raspberry Pi 5, by leveraging its GPU capabilities via Vulkan.

✨ Core Concepts

The library operates on a multi-layered architecture to maximize hardware utilization:

CPU Backend (Software): Uses a pure NumPy-backed autograd engine and nn module for reliable computation when GPU acceleration is unavailable.
GPU Backend (Hardware): Features an experimental Vulkan backend for high-speed tensor operations (elementwise math, matmul, reductions) directly on the Pi 5's GPU.
Interface: Provides a streamlined CLI/Streamlit UI for interactive model building, training, persistence, and inspection.

The Vulkan path relies on real compute shaders compiled to SPIR-V, giving deep control over the underlying hardware.

🔌 Backend Abstraction (Connectable Backends)

rasptorch now exposes a backend abstraction API so compute backends can be registered and connected at runtime:

import rasptorch

# Inspect availability
print(rasptorch.available_backends())  # {'cpu': True, 'vulkan': ..., 'opencl': ..., 'cuda': ...}

# Try to connect a backend (falls back to CPU in non-strict mode)
active = rasptorch.connect_backend("vulkan", strict=False)
print(active.name)

Built-in backend adapters:

numpy (NumPy adapter; internal key: cpu) - Pure NumPy autograd
vulkan (rasptorch Vulkan kernels, with optional CPU fallback) - Optimized for Raspberry Pi 4/5 ⚡
opencl (pyopencl when available, optional CPU fallback)
cuda (CuPy when available, with PyTorch CUDA fallback, optional CPU fallback)

CLI helpers:

rasptorch backend list
rasptorch backend connect numpy
rasptorch backend connect vulkan --strict
# Benchmark with auto-tuned Vulkan kernel and submission batching
rasptorch --json backend benchmark --backends numpy,vulkan,cuda --size 2048 --iterations 100 --warmup 20 --vulkan-kernel auto --vulkan-autotune-submit --seed 42

Note: User-facing CLI/UI labels the CPU backend as numpy. Vulkan benchmark mode uses resident buffers (upload once, repeated on-device matmul, download once). Performance (Optimized): Vulkan achieves ~564 GFLOPS (78% of NumPy on matmul_vec4 with auto-tuning). --vulkan-kernel auto probes matmul, matmul_vec4, matmul_a_bt, and matmul_a_bt_tiled (when available) and keeps the faster path. If Vulkan hits VkErrorDeviceLost, lower --vulkan-submit-every (for example, 4 or 1) or use auto-tuning. Recommended: Use --vulkan-autotune-submit to jointly probe kernel + submit chunk and pick the fastest stable combo. Optimizations: Command buffer batching, memory-mapped buffers, auto kernel selection.

📚 What's Included (Core Features)

Tensor Operations: Support for elementwise math, matrix multiplication (matmul), reductions, indexing, reshaping, stacking, and broadcasting.
Layers: Includes standard neural network blocks: Linear, MLP, CNN, GRU, Transformer, normalization layers, activations, pooling, embeddings, and attention.
Training Tools: Full suite of tools including optimizers (SGD), learning-rate schedulers, gradient clipping, and regularization helpers.
Persistence: Ability to save and load checkpoint weights without needing the full torch dependency.
Interfaces: CLI (rasptorch chat) and Streamlit UI (rasptorch ui).

🚀 Getting Started

1. Installation

A. Basic Install (CPU Only): To get the core library components running on the CPU:

pip install rasptorch

B. Development Install (Full Capability): For local development and access to all potential backends:

pip install -e ".[dev]"

C. GPU Mode Prerequisites: To utilize the GPU backend, you must meet these prerequisites:

Raspberry Pi 5 with working Vulkan drivers.
The glslc shader compiler must be available in your system PATH.
When running, you must specify the device: --device gpu or --device auto.

2. Quick Run Examples

Start the Interactive Shell:

uv run rasptorch chat

Launch the Web UI:

uv run rasptorch ui

(This will usually open at http://localhost:8501)

Viewing Help: To see all available CLI subcommands:

uv run rasptorch --help

⚙️ Execution Modes & Workflows

The main.py script controls the operational mode:

cpu: Pure NumPy autograd execution on the CPU.
gpu: Executes the training loop explicitly using the Vulkan backend kernels.
gpu-autograd: An experimental mode for tracing gradients across the GPU pipeline.

Example Training Command:

uv run main.py --device gpu --epochs 50 --batch-size 32 --lr 0.01

📊 Benchmarks

rasptorch provides a built-in benchmark tool for comparing backend performance on matrix multiplication:

Quick Benchmark (Single Size):

# Benchmark with default settings (2048x2048 matmul, 100 iterations)
uv run rasptorch backend benchmark

# Benchmark with custom size and multiple backends
uv run rasptorch --json backend benchmark --backends numpy,vulkan,cuda --size 2048 --iterations 100 --warmup 20 --seed 42

Performance Results (Raspberry Pi 5, 2048x2048 matmul, optimized):

Backend	Time (s)	Iterations/s	GFLOPS	Status
NumPy	2.25	44.4	763	Reference
Vulkan (auto-tuned)	3.15	31.8	546	⚡ GPU
CUDA (when available)	0.56	178	3059	Best

Vulkan Kernel Selection: The --vulkan-kernel auto flag intelligently probes available kernels:

matmul - Basic single-threaded implementation
matmul_vec4 - SIMD-style vec4 operations
matmul_a_bt - Matrix transpose optimization (for A @ B.T)
matmul_a_bt_tiled - Tiled transpose optimization (fastest when applicable)

Advanced Tuning:

# Auto-tune both kernel AND submission batching strategy
uv run rasptorch --json backend benchmark --backends vulkan --size 2048 \
  --iterations 100 --warmup 20 \
  --vulkan-kernel auto \
  --vulkan-autotune-submit \
  --seed 42

# Manual kernel selection with custom batch submission
uv run rasptorch --json backend benchmark --backends vulkan \
  --vulkan-kernel matmul_a_bt_tiled \
  --vulkan-submit-every 4 \
  --size 2048 --iterations 100

Output Format: Results are provided in JSON format (with --json flag) including:

status: "ok" or "unavailable"
elapsed_seconds: Total benchmark time
iterations_per_second: Throughput metric
estimated_gflops: Floating-point performance
checksum: Verification result
kernel: Selected kernel name (for auto mode)
submit_every: Submission batch size (for Vulkan)

Optimization Tips:

Use --vulkan-autotune-submit for best results (probes kernel + batch combinations)
If you see VkErrorDeviceLost, reduce --vulkan-submit-every (try 4 or 1)
Larger problem sizes better amortize GPU setup overhead
Command buffer batching (--vulkan-submit-every) balances latency and throughput

For detailed optimization guide, see VULKAN_OPTIMIZATION.md.

🧠 Advanced Topics

1. Tensor Operations

Basic tensor math is performed via:

# Create tensors
uv run rasptorch tensor random --shape 2,3,4
uv run rasptorch tensor ones --shape 5,10

The results show the low-level tensor capabilities.

2. Model Definition

Models are defined using structured commands:

# Simple MLP
uv run rasptorch model mlp --layers "64,32,16,2"
# Complex CNN
uv run rasptorch model cnn --in-channels 3 --out-channels "32,64,128"

Managing the lifecycle:

uv run rasptorch model list
uv run rasptorch model save --model-id <id> --path model.pth

🩹 Troubleshooting & Best Practices

Performance: The fastest paths are those that keep the computation entirely on the GPU and minimize data transfer across the PCIe bus.
Fallback: If GPU operations fail due to driver issues, the system gracefully falls back to the CPU NumPy path, but performance will suffer.
Advanced Use: For understanding the deep dive into custom kernel optimization, please refer to the source code in the rasptorch/gpu_demo.py and rasptorch/main.py scripts.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

3.4.0

Apr 19, 2026

3.3.1

Apr 11, 2026

3.2.1

Apr 2, 2026

3.2.0 yanked

Apr 2, 2026

Reason this release was yanked:

Bugs

3.1.0

Mar 30, 2026

3.0.0

Mar 28, 2026

2.0.5

Mar 14, 2026

2.0.4 yanked

Mar 14, 2026

Reason this release was yanked:

Bugs

2.0.2 yanked

Mar 13, 2026

Reason this release was yanked:

Bugs

1.4.0

Mar 11, 2026

1.3.2

Mar 8, 2026

1.3.1 yanked

Mar 8, 2026

Reason this release was yanked:

Outdated

1.3.0 yanked

Mar 8, 2026

Reason this release was yanked:

Outdated

1.2.0 yanked

Feb 15, 2026

Reason this release was yanked:

Outdated

1.1.0 yanked

Feb 15, 2026

Reason this release was yanked:

Outdated

0.1.0 yanked

Feb 15, 2026

Reason this release was yanked:

Too Outdated/Major Bugs

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rasptorch-3.4.0.tar.gz (205.6 kB view details)

Uploaded Apr 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rasptorch-3.4.0-py3-none-any.whl (236.4 kB view details)

Uploaded Apr 19, 2026 Python 3

File details

Details for the file rasptorch-3.4.0.tar.gz.

File metadata

Download URL: rasptorch-3.4.0.tar.gz
Upload date: Apr 19, 2026
Size: 205.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for rasptorch-3.4.0.tar.gz
Algorithm	Hash digest
SHA256	`37a925dc1af2187a53e85c37db9413e05562f149143ce85bcfa5cffabde86b32`
MD5	`337717a3f569a19b264f1d3cc27e695e`
BLAKE2b-256	`e64a25baa0e3da92d6aa5739a974c516cbeed4b3a645176efda9566301afbd1d`

See more details on using hashes here.

File details

Details for the file rasptorch-3.4.0-py3-none-any.whl.

File metadata

Download URL: rasptorch-3.4.0-py3-none-any.whl
Upload date: Apr 19, 2026
Size: 236.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for rasptorch-3.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`72a9a0a3307d9ffac45b38ed001e622c0880b89ea7d172aec8309245f59e66ac`
MD5	`632dd751b5ec74b143704c89dd9145f2`
BLAKE2b-256	`2e0203ef1c9f75b5a56d44c550f2959b2cbd1142398352166e12b9a5788b339c`

See more details on using hashes here.

rasptorch 3.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🍓🎇 rasptorch

✨ Core Concepts

🔌 Backend Abstraction (Connectable Backends)

📚 What's Included (Core Features)

🚀 Getting Started

1. Installation

2. Quick Run Examples

⚙️ Execution Modes & Workflows

📊 Benchmarks

🧠 Advanced Topics

1. Tensor Operations

2. Model Definition

🩹 Troubleshooting & Best Practices

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes