Skip to main content

GPU-accelerated neural network operations using Vulkan compute shaders

Project description

Grilly

Grilly

Deep learning, well done.

CI PyPI Tests License: MIT

GPU-accelerated neural network framework using Vulkan compute shaders. PyTorch-like API that runs on any GPU — AMD, NVIDIA, Intel — no CUDA dependency. 190 GLSL compute shaders compiled to SPIR-V, dispatched through a native C++ layer.

Alpha software. APIs may change between minor versions.


Installation

pip install grilly

For GPU acceleration (requires Vulkan SDK and C++ toolchain):

git clone https://github.com/grillcheese-ai/grilly.git
cd grilly
pip install -e ".[dev]"
cmake -B build -DPYBIND11_FINDPYTHON=ON
cmake --build build --config Release
cp build/Release/grilly_core.*.pyd .   # Windows
# cp build/grilly_core.*.so .          # Linux

Pre-built C++ extension (Windows x64 only):

Download grilly_core.cp312-win_amd64.pyd from the latest release and place it in your grilly install directory:

# Find where grilly is installed
python -c "import grilly; print(grilly.__file__)"
# Copy the .pyd to that directory
cp grilly_core.cp312-win_amd64.pyd /path/to/grilly/

Without the C++ extension, grilly works fully via pure Python + numpy fallbacks — just without GPU acceleration.

See INSTALL.md for full setup, Ubuntu instructions, and troubleshooting.

Requirements

Minimum Recommended
Python 3.12+ 3.12
GPU VRAM 8 GB 12 GB+
System RAM 32 GB 64 GB
Vulkan 1.1+ Latest drivers

Supported GPUs: AMD (RX 5000+), NVIDIA (GTX 1060+), Intel (Arc A-series).


Quick Start

import numpy as np
from grilly import nn
from grilly.optim import AdamW

model = nn.Sequential(
    nn.Linear(784, 256),
    nn.ReLU(),
    nn.Linear(256, 10),
)

optimizer = AdamW(model.parameters(), lr=1e-3)
loss_fn = nn.CrossEntropyLoss()

x = np.random.randn(32, 784).astype(np.float32)
targets = np.random.randint(0, 10, (32,))

logits = model(x)
loss = loss_fn(logits, targets)
grad = loss_fn.backward(np.ones_like(loss), logits, targets)

model.zero_grad()
model.backward(grad)
optimizer.step()

Autograd

from grilly.nn import Variable, tensor

x = Variable(tensor([1.0, 2.0, 3.0]), requires_grad=True)
y = (x * x).sum()
y.backward()
print(x.grad)  # [2.0, 4.0, 6.0]

Functional API

import grilly.functional as F

F.linear(x, weight, bias)
F.relu(x)
F.softmax(x, dim=-1)
F.flash_attention2(q, k, v)

Architecture

Python (VulkanTensor) → C++ Bridge (grilly_core) → Vulkan Compute Shaders
  nn/ modules            pybind11 bindings           190 SPIR-V shaders
  functional/ ops        dual-validity GPU/CPU        AMD / NVIDIA / Intel
  optim/                 zero CPU↔GPU ping-pong       No CUDA needed

Package layout:

grilly/
├── backend/        # Vulkan GPU dispatch (core, compute, pipelines, autograd)
├── cpp/            # C++ pybind11 extension — grilly_core native ops
├── nn/             # nn.Module layers, SNN framework, multimodal fusion, autograd
├── functional/     # Stateless F.* API (mirrors torch.nn.functional)
├── optim/          # Optimizers and LR schedulers
├── utils/          # DataLoader, VulkanTensor, HuggingFaceBridge, checkpointing
├── shaders/        # 190 GLSL compute shaders + compiled SPIR-V
├── experimental/   # VSA, MoE routing, temporal reasoning, cognitive controller
└── tests/          # 1,820 tests

What's New in 0.5.0 "GPU-First"

  • C++ Tensor with dual-validity tracking — data stays GPU-resident between ops; no CPU ping-pong
  • Flash Attention 3 with subgroup acceleration
  • HYLAAttention (softmax-free), FNetMixing, SympFormerBlock
  • TAPPA q-similarity for adaptive KV cache eviction
  • HDC packed ops — 32x memory compression + block-code circular convolution
  • Sanger GHA for neurogenesis
  • DisARM gradient estimator
  • JIT compilation framework (@grilly.jit)
  • Automatic Mixed Precision (autocast + GradScaler)
  • ProjectionHeads for structured embeddings
  • StreamingPipeline for batched embed + upload
  • bindings.cpp refactored into 11 focused files

Features

Layers

Category Modules
Linear Linear, Embedding, Dropout
Convolution Conv1d, Conv2d
Recurrent LSTM, LSTMCell, GRU, GRUCell
Normalization LayerNorm, RMSNorm, BatchNorm1d, BatchNorm2d
Activations ReLU, GELU, SiLU, SwiGLU, GCU, RoSwish
Attention FlashAttention2/3, HYLAAttention, MultiheadAttention, RoPE
LoRA LoRALinear, LoRAAttention, LoRAModel
Pooling MaxPool2d, AvgPool2d, AdaptiveMaxPool2d
Loss MSELoss, CrossEntropyLoss, BCELoss
Containers Sequential, Residual

Spiking Neural Networks

  • Neuron models: IFNode, LIFNode, ParametricLIFNode
  • Surrogate gradients: ATan, Sigmoid, FastSigmoid
  • Temporal containers: SeqToANNContainer, MultiStepContainer
  • ANN-to-SNN conversion: Converter, VoltageScaler

Optimizers

AdamW, Adam, SGD, NLMS, NaturalGradient, AutoHypergradientAdamW (OSGM-style auto LR), plus schedulers: StepLR, CosineAnnealingLR, ReduceLROnPlateau.


Ecosystem

Package Description
optimum-grilly HuggingFace Optimum backend — from_pretrained → Vulkan inference
CubeMind Neuro-vector-symbolic reasoning powered by grilly 0.5.0

Testing

uv run pytest tests/ -v                          # all tests (requires Vulkan)
uv run pytest tests/ -m "not gpu" -v             # CPU-only
uv run pytest tests/ --cov=. --cov-report=term   # with coverage

Environment Variables

Variable Description Default
VK_GPU_INDEX Select GPU by index 0
GRILLY_DEBUG Enable debug logging (1 = on) off
ALLOW_CPU_VULKAN Allow Mesa llvmpipe software Vulkan off

Contributing

  1. Fork the repo and create a feature branch
  2. Add tests for new features
  3. Run ruff check . and uv run pytest tests/ -v
  4. Submit a pull request

License

MIT License — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grilly-0.5.4.tar.gz (29.7 MB view details)

Uploaded Source

File details

Details for the file grilly-0.5.4.tar.gz.

File metadata

  • Download URL: grilly-0.5.4.tar.gz
  • Upload date:
  • Size: 29.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for grilly-0.5.4.tar.gz
Algorithm Hash digest
SHA256 461b91374cf0770191595733e20f1fb0aa2312a9230d61a0c9a8ba53003aca20
MD5 1d4bb80f10a8085f07c259c4de3ba836
BLAKE2b-256 46b0284556c9f1fd7dc5c566cc74ac2e6a151082a1b1d1e56d6d5a2e5ed594d5

See more details on using hashes here.

Provenance

The following attestation bundles were made for grilly-0.5.4.tar.gz:

Publisher: publish.yml on Grillcheese-AI/grilly

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page