Skip to main content

GPU-accelerated neural network operations using Vulkan compute shaders

Project description

Grilly

Grilly

Deep learning, well done.

CI PyPI Tests License: MIT Docs

GPU-accelerated neural network framework using Vulkan compute shaders. PyTorch-like API that runs on any GPU — AMD, NVIDIA, Intel — no CUDA dependency. 190 GLSL compute shaders compiled to SPIR-V, dispatched through a native C++ layer.

Alpha software. APIs may change between minor versions.


Installation

pip install grilly

For GPU acceleration (requires Vulkan SDK and C++ toolchain):

git clone https://github.com/grillcheese-ai/grilly.git
cd grilly
pip install -e ".[dev]"
cmake -B build -DPYBIND11_FINDPYTHON=ON
cmake --build build --config Release
cp build/Release/grilly_core.*.pyd .   # Windows
# cp build/grilly_core.*.so .          # Linux

Pre-built C++ extension (Windows x64 only):

Download grilly_core.cp312-win_amd64.pyd from the latest release and place it in your grilly install directory:

# Find where grilly is installed
python -c "import grilly; print(grilly.__file__)"
# Copy the .pyd to that directory
cp grilly_core.cp312-win_amd64.pyd /path/to/grilly/

Without the C++ extension, grilly works fully via pure Python + numpy fallbacks — just without GPU acceleration.

See INSTALL.md for full setup, Ubuntu instructions, and troubleshooting.

Requirements

Minimum Recommended
Python 3.12+ 3.12
GPU VRAM 8 GB 12 GB+
System RAM 32 GB 64 GB
Vulkan 1.1+ Latest drivers

Supported GPUs: AMD (RX 5000+), NVIDIA (GTX 1060+), Intel (Arc A-series).


Quick Start

import numpy as np
from grilly import nn
from grilly.optim import AdamW

model = nn.Sequential(
    nn.Linear(784, 256),
    nn.ReLU(),
    nn.Linear(256, 10),
)

optimizer = AdamW(model.parameters(), lr=1e-3)
loss_fn = nn.CrossEntropyLoss()

x = np.random.randn(32, 784).astype(np.float32)
targets = np.random.randint(0, 10, (32,))

logits = model(x)
loss = loss_fn(logits, targets)
grad = loss_fn.backward(np.ones_like(loss), logits, targets)

model.zero_grad()
model.backward(grad)
optimizer.step()

Autograd

from grilly.nn import Variable, tensor

x = Variable(tensor([1.0, 2.0, 3.0]), requires_grad=True)
y = (x * x).sum()
y.backward()
print(x.grad)  # [2.0, 4.0, 6.0]

Functional API

import grilly.functional as F

F.linear(x, weight, bias)
F.relu(x)
F.softmax(x, dim=-1)
F.flash_attention2(q, k, v)

Architecture

Python (VulkanTensor) → C++ Bridge (grilly_core) → Vulkan Compute Shaders
  nn/ modules            pybind11 bindings           190 SPIR-V shaders
  functional/ ops        dual-validity GPU/CPU        AMD / NVIDIA / Intel
  optim/                 zero CPU↔GPU ping-pong       No CUDA needed

Package layout:

grilly/
├── backend/        # Vulkan GPU dispatch (core, compute, pipelines, autograd)
├── cpp/            # C++ pybind11 extension — grilly_core native ops
├── nn/             # nn.Module layers, SNN framework, multimodal fusion, autograd
├── functional/     # Stateless F.* API (mirrors torch.nn.functional)
├── optim/          # Optimizers and LR schedulers
├── utils/          # DataLoader, VulkanTensor, HuggingFaceBridge, checkpointing
├── shaders/        # 190 GLSL compute shaders + compiled SPIR-V
├── experimental/   # VSA, MoE routing, temporal reasoning, cognitive controller
└── tests/          # 1,820 tests

What's New in 0.5.0 "GPU-First"

  • C++ Tensor with dual-validity tracking — data stays GPU-resident between ops; no CPU ping-pong
  • Flash Attention 3 with subgroup acceleration
  • HYLAAttention (softmax-free), FNetMixing, SympFormerBlock
  • TAPPA q-similarity for adaptive KV cache eviction
  • HDC packed ops — 32x memory compression + block-code circular convolution
  • Sanger GHA for neurogenesis
  • DisARM gradient estimator
  • JIT compilation framework (@grilly.jit)
  • Automatic Mixed Precision (autocast + GradScaler)
  • ProjectionHeads for structured embeddings
  • StreamingPipeline for batched embed + upload
  • bindings.cpp refactored into 11 focused files

Features

Layers

Category Modules
Linear Linear, Embedding, Dropout
Convolution Conv1d, Conv2d
Recurrent LSTM, LSTMCell, GRU, GRUCell
Normalization LayerNorm, RMSNorm, BatchNorm1d, BatchNorm2d
Activations ReLU, GELU, SiLU, SwiGLU, GCU, RoSwish
Attention FlashAttention2/3, HYLAAttention, MultiheadAttention, RoPE
LoRA LoRALinear, LoRAAttention, LoRAModel
Pooling MaxPool2d, AvgPool2d, AdaptiveMaxPool2d
Loss MSELoss, CrossEntropyLoss, BCELoss
Containers Sequential, Residual

Spiking Neural Networks

  • Neuron models: IFNode, LIFNode, ParametricLIFNode
  • Surrogate gradients: ATan, Sigmoid, FastSigmoid
  • Temporal containers: SeqToANNContainer, MultiStepContainer
  • ANN-to-SNN conversion: Converter, VoltageScaler

Optimizers

AdamW, Adam, SGD, NLMS, NaturalGradient, AutoHypergradientAdamW (OSGM-style auto LR), plus schedulers: StepLR, CosineAnnealingLR, ReduceLROnPlateau.


Ecosystem

Package Description
optimum-grilly HuggingFace Optimum backend — from_pretrained → Vulkan inference
CubeMind Neuro-vector-symbolic reasoning powered by grilly 0.5.0

Testing

uv run pytest tests/ -v                          # all tests (requires Vulkan)
uv run pytest tests/ -m "not gpu" -v             # CPU-only
uv run pytest tests/ --cov=. --cov-report=term   # with coverage

Environment Variables

Variable Description Default
VK_GPU_INDEX Select GPU by index 0
GRILLY_DEBUG Enable debug logging (1 = on) off
ALLOW_CPU_VULKAN Allow Mesa llvmpipe software Vulkan off

Contributing

  1. Fork the repo and create a feature branch
  2. Add tests for new features
  3. Run ruff check . and uv run pytest tests/ -v
  4. Submit a pull request

License

MIT License — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grilly-0.6.1.tar.gz (29.8 MB view details)

Uploaded Source

File details

Details for the file grilly-0.6.1.tar.gz.

File metadata

  • Download URL: grilly-0.6.1.tar.gz
  • Upload date:
  • Size: 29.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for grilly-0.6.1.tar.gz
Algorithm Hash digest
SHA256 999b5e8a04f21bc2438654fec77ed3c4dc06195047011e1d91d1fb9b4ca78297
MD5 648001eca1f770c95c032ea4f0bfa133
BLAKE2b-256 fb3501e3a1d99f659c6b5ad24c14f0b77fce292c6b95e28c8576f31709252558

See more details on using hashes here.

Provenance

The following attestation bundles were made for grilly-0.6.1.tar.gz:

Publisher: publish.yml on Grillcheese-AI/grilly

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page