GPU-accelerated neural network operations using Vulkan compute shaders
Project description
Grilly
Deep learning, well done.
GPU-accelerated neural network framework using Vulkan compute shaders. PyTorch-like API that runs on any GPU — AMD, NVIDIA, Intel — no CUDA dependency. 190 GLSL compute shaders compiled to SPIR-V, dispatched through a native C++ layer.
Alpha software. APIs may change between minor versions.
Installation
pip install grilly
For GPU acceleration (requires Vulkan SDK and C++ toolchain):
git clone https://github.com/grillcheese-ai/grilly.git
cd grilly
pip install -e ".[dev]"
cmake -B build -DPYBIND11_FINDPYTHON=ON
cmake --build build --config Release
cp build/Release/grilly_core.*.pyd . # Windows
# cp build/grilly_core.*.so . # Linux
Pre-built C++ extension (Windows x64 only):
Download grilly_core.cp312-win_amd64.pyd from the latest release and place it in your grilly install directory:
# Find where grilly is installed
python -c "import grilly; print(grilly.__file__)"
# Copy the .pyd to that directory
cp grilly_core.cp312-win_amd64.pyd /path/to/grilly/
Without the C++ extension, grilly works fully via pure Python + numpy fallbacks — just without GPU acceleration.
See INSTALL.md for full setup, Ubuntu instructions, and troubleshooting.
Requirements
| Minimum | Recommended | |
|---|---|---|
| Python | 3.12+ | 3.12 |
| GPU VRAM | 8 GB | 12 GB+ |
| System RAM | 32 GB | 64 GB |
| Vulkan | 1.1+ | Latest drivers |
Supported GPUs: AMD (RX 5000+), NVIDIA (GTX 1060+), Intel (Arc A-series).
Quick Start
import numpy as np
from grilly import nn
from grilly.optim import AdamW
model = nn.Sequential(
nn.Linear(784, 256),
nn.ReLU(),
nn.Linear(256, 10),
)
optimizer = AdamW(model.parameters(), lr=1e-3)
loss_fn = nn.CrossEntropyLoss()
x = np.random.randn(32, 784).astype(np.float32)
targets = np.random.randint(0, 10, (32,))
logits = model(x)
loss = loss_fn(logits, targets)
grad = loss_fn.backward(np.ones_like(loss), logits, targets)
model.zero_grad()
model.backward(grad)
optimizer.step()
Autograd
from grilly.nn import Variable, tensor
x = Variable(tensor([1.0, 2.0, 3.0]), requires_grad=True)
y = (x * x).sum()
y.backward()
print(x.grad) # [2.0, 4.0, 6.0]
Functional API
import grilly.functional as F
F.linear(x, weight, bias)
F.relu(x)
F.softmax(x, dim=-1)
F.flash_attention2(q, k, v)
Architecture
Python (VulkanTensor) → C++ Bridge (grilly_core) → Vulkan Compute Shaders
nn/ modules pybind11 bindings 190 SPIR-V shaders
functional/ ops dual-validity GPU/CPU AMD / NVIDIA / Intel
optim/ zero CPU↔GPU ping-pong No CUDA needed
Package layout:
grilly/
├── backend/ # Vulkan GPU dispatch (core, compute, pipelines, autograd)
├── cpp/ # C++ pybind11 extension — grilly_core native ops
├── nn/ # nn.Module layers, SNN framework, multimodal fusion, autograd
├── functional/ # Stateless F.* API (mirrors torch.nn.functional)
├── optim/ # Optimizers and LR schedulers
├── utils/ # DataLoader, VulkanTensor, HuggingFaceBridge, checkpointing
├── shaders/ # 190 GLSL compute shaders + compiled SPIR-V
├── experimental/ # VSA, MoE routing, temporal reasoning, cognitive controller
└── tests/ # 1,820 tests
What's New in 0.5.0 "GPU-First"
- C++ Tensor with dual-validity tracking — data stays GPU-resident between ops; no CPU ping-pong
- Flash Attention 3 with subgroup acceleration
- HYLAAttention (softmax-free), FNetMixing, SympFormerBlock
- TAPPA q-similarity for adaptive KV cache eviction
- HDC packed ops — 32x memory compression + block-code circular convolution
- Sanger GHA for neurogenesis
- DisARM gradient estimator
- JIT compilation framework (
@grilly.jit) - Automatic Mixed Precision (
autocast+GradScaler) - ProjectionHeads for structured embeddings
- StreamingPipeline for batched embed + upload
bindings.cpprefactored into 11 focused files
Features
Layers
| Category | Modules |
|---|---|
| Linear | Linear, Embedding, Dropout |
| Convolution | Conv1d, Conv2d |
| Recurrent | LSTM, LSTMCell, GRU, GRUCell |
| Normalization | LayerNorm, RMSNorm, BatchNorm1d, BatchNorm2d |
| Activations | ReLU, GELU, SiLU, SwiGLU, GCU, RoSwish |
| Attention | FlashAttention2/3, HYLAAttention, MultiheadAttention, RoPE |
| LoRA | LoRALinear, LoRAAttention, LoRAModel |
| Pooling | MaxPool2d, AvgPool2d, AdaptiveMaxPool2d |
| Loss | MSELoss, CrossEntropyLoss, BCELoss |
| Containers | Sequential, Residual |
Spiking Neural Networks
- Neuron models:
IFNode,LIFNode,ParametricLIFNode - Surrogate gradients:
ATan,Sigmoid,FastSigmoid - Temporal containers:
SeqToANNContainer,MultiStepContainer - ANN-to-SNN conversion:
Converter,VoltageScaler
Optimizers
AdamW, Adam, SGD, NLMS, NaturalGradient, AutoHypergradientAdamW (OSGM-style auto LR), plus schedulers: StepLR, CosineAnnealingLR, ReduceLROnPlateau.
Ecosystem
| Package | Description |
|---|---|
| optimum-grilly | HuggingFace Optimum backend — from_pretrained → Vulkan inference |
| CubeMind | Neuro-vector-symbolic reasoning powered by grilly 0.5.0 |
Testing
uv run pytest tests/ -v # all tests (requires Vulkan)
uv run pytest tests/ -m "not gpu" -v # CPU-only
uv run pytest tests/ --cov=. --cov-report=term # with coverage
Environment Variables
| Variable | Description | Default |
|---|---|---|
VK_GPU_INDEX |
Select GPU by index | 0 |
GRILLY_DEBUG |
Enable debug logging (1 = on) |
off |
ALLOW_CPU_VULKAN |
Allow Mesa llvmpipe software Vulkan | off |
Contributing
- Fork the repo and create a feature branch
- Add tests for new features
- Run
ruff check .anduv run pytest tests/ -v - Submit a pull request
License
MIT License — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file grilly-0.5.4.tar.gz.
File metadata
- Download URL: grilly-0.5.4.tar.gz
- Upload date:
- Size: 29.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
461b91374cf0770191595733e20f1fb0aa2312a9230d61a0c9a8ba53003aca20
|
|
| MD5 |
1d4bb80f10a8085f07c259c4de3ba836
|
|
| BLAKE2b-256 |
46b0284556c9f1fd7dc5c566cc74ac2e6a151082a1b1d1e56d6d5a2e5ed594d5
|
Provenance
The following attestation bundles were made for grilly-0.5.4.tar.gz:
Publisher:
publish.yml on Grillcheese-AI/grilly
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
grilly-0.5.4.tar.gz -
Subject digest:
461b91374cf0770191595733e20f1fb0aa2312a9230d61a0c9a8ba53003aca20 - Sigstore transparency entry: 1139893668
- Sigstore integration time:
-
Permalink:
Grillcheese-AI/grilly@2036ff4759eb5c73a890a79d155a0ad85ae66daa -
Branch / Tag:
refs/tags/v0.5.4 - Owner: https://github.com/Grillcheese-AI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2036ff4759eb5c73a890a79d155a0ad85ae66daa -
Trigger Event:
release
-
Statement type: