GPU-accelerated neural network operations using Vulkan compute shaders

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

grillcheese

These details have not been verified by PyPI

Project links

Project description

Grilly

Deep learning, well done.

GPU-accelerated neural network framework using Vulkan compute shaders. PyTorch-like API that runs on any GPU — AMD, NVIDIA, Intel — no CUDA dependency. 190 GLSL compute shaders compiled to SPIR-V, dispatched through a native C++ layer.

Alpha software. APIs may change between minor versions.

Installation

pip install grilly

For GPU acceleration (requires Vulkan SDK and C++ toolchain):

git clone https://github.com/grillcheese-ai/grilly.git
cd grilly
pip install -e ".[dev]"
cmake -B build -DPYBIND11_FINDPYTHON=ON
cmake --build build --config Release
cp build/Release/grilly_core.*.pyd .   # Windows
# cp build/grilly_core.*.so .          # Linux

Pre-built C++ extension (Windows x64 only):

Download grilly_core.cp312-win_amd64.pyd from the latest release and place it in your grilly install directory:

# Find where grilly is installed
python -c "import grilly; print(grilly.__file__)"
# Copy the .pyd to that directory
cp grilly_core.cp312-win_amd64.pyd /path/to/grilly/

Without the C++ extension, grilly works fully via pure Python + numpy fallbacks — just without GPU acceleration.

See INSTALL.md for full setup, Ubuntu instructions, and troubleshooting.

Requirements

	Minimum	Recommended
Python	3.12+	3.12
GPU VRAM	8 GB	12 GB+
System RAM	32 GB	64 GB
Vulkan	1.1+	Latest drivers

Supported GPUs: AMD (RX 5000+), NVIDIA (GTX 1060+), Intel (Arc A-series).

Quick Start

import numpy as np
from grilly import nn
from grilly.optim import AdamW

model = nn.Sequential(
    nn.Linear(784, 256),
    nn.ReLU(),
    nn.Linear(256, 10),
)

optimizer = AdamW(model.parameters(), lr=1e-3)
loss_fn = nn.CrossEntropyLoss()

x = np.random.randn(32, 784).astype(np.float32)
targets = np.random.randint(0, 10, (32,))

logits = model(x)
loss = loss_fn(logits, targets)
grad = loss_fn.backward(np.ones_like(loss), logits, targets)

model.zero_grad()
model.backward(grad)
optimizer.step()

Autograd

from grilly.nn import Variable, tensor

x = Variable(tensor([1.0, 2.0, 3.0]), requires_grad=True)
y = (x * x).sum()
y.backward()
print(x.grad)  # [2.0, 4.0, 6.0]

Functional API

import grilly.functional as F

F.linear(x, weight, bias)
F.relu(x)
F.softmax(x, dim=-1)
F.flash_attention2(q, k, v)

Architecture

Python (VulkanTensor) → C++ Bridge (grilly_core) → Vulkan Compute Shaders
  nn/ modules            pybind11 bindings           190 SPIR-V shaders
  functional/ ops        dual-validity GPU/CPU        AMD / NVIDIA / Intel
  optim/                 zero CPU↔GPU ping-pong       No CUDA needed

Package layout:

grilly/
├── backend/        # Vulkan GPU dispatch (core, compute, pipelines, autograd)
├── cpp/            # C++ pybind11 extension — grilly_core native ops
├── nn/             # nn.Module layers, SNN framework, multimodal fusion, autograd
├── functional/     # Stateless F.* API (mirrors torch.nn.functional)
├── optim/          # Optimizers and LR schedulers
├── utils/          # DataLoader, VulkanTensor, HuggingFaceBridge, checkpointing
├── shaders/        # 190 GLSL compute shaders + compiled SPIR-V
├── experimental/   # VSA, MoE routing, temporal reasoning, cognitive controller
└── tests/          # 1,820 tests

What's New in 0.5.0 "GPU-First"

C++ Tensor with dual-validity tracking — data stays GPU-resident between ops; no CPU ping-pong
Flash Attention 3 with subgroup acceleration
HYLAAttention (softmax-free), FNetMixing, SympFormerBlock
TAPPA q-similarity for adaptive KV cache eviction
HDC packed ops — 32x memory compression + block-code circular convolution
Sanger GHA for neurogenesis
DisARM gradient estimator
JIT compilation framework (@grilly.jit)
Automatic Mixed Precision (autocast + GradScaler)
ProjectionHeads for structured embeddings
StreamingPipeline for batched embed + upload
bindings.cpp refactored into 11 focused files

Features

Layers

Category	Modules
Linear	`Linear`, `Embedding`, `Dropout`
Convolution	`Conv1d`, `Conv2d`
Recurrent	`LSTM`, `LSTMCell`, `GRU`, `GRUCell`
Normalization	`LayerNorm`, `RMSNorm`, `BatchNorm1d`, `BatchNorm2d`
Activations	`ReLU`, `GELU`, `SiLU`, `SwiGLU`, `GCU`, `RoSwish`
Attention	`FlashAttention2/3`, `HYLAAttention`, `MultiheadAttention`, `RoPE`
LoRA	`LoRALinear`, `LoRAAttention`, `LoRAModel`
Pooling	`MaxPool2d`, `AvgPool2d`, `AdaptiveMaxPool2d`
Loss	`MSELoss`, `CrossEntropyLoss`, `BCELoss`
Containers	`Sequential`, `Residual`

Spiking Neural Networks

Neuron models: IFNode, LIFNode, ParametricLIFNode
Surrogate gradients: ATan, Sigmoid, FastSigmoid
Temporal containers: SeqToANNContainer, MultiStepContainer
ANN-to-SNN conversion: Converter, VoltageScaler

Optimizers

AdamW, Adam, SGD, NLMS, NaturalGradient, AutoHypergradientAdamW (OSGM-style auto LR), plus schedulers: StepLR, CosineAnnealingLR, ReduceLROnPlateau.

Ecosystem

Package	Description
optimum-grilly	HuggingFace Optimum backend — `from_pretrained` → Vulkan inference
CubeMind	Neuro-vector-symbolic reasoning powered by grilly 0.5.0

Testing

uv run pytest tests/ -v                          # all tests (requires Vulkan)
uv run pytest tests/ -m "not gpu" -v             # CPU-only
uv run pytest tests/ --cov=. --cov-report=term   # with coverage

Environment Variables

Variable	Description	Default
`VK_GPU_INDEX`	Select GPU by index	`0`
`GRILLY_DEBUG`	Enable debug logging (`1` = on)	off
`ALLOW_CPU_VULKAN`	Allow Mesa llvmpipe software Vulkan	off

Contributing

Fork the repo and create a feature branch
Add tests for new features
Run ruff check . and uv run pytest tests/ -v
Submit a pull request

License

MIT License — see LICENSE for details.

Project details

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

grillcheese

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.0

Apr 10, 2026

0.6.1

Mar 30, 2026

0.5.6

Mar 20, 2026

0.5.5

Mar 20, 2026

This version

0.5.4

Mar 20, 2026

0.5.3

Mar 18, 2026

0.5.2 yanked

Mar 18, 2026

0.5.0

Mar 18, 2026

0.4.6

Mar 15, 2026

0.4.5

Feb 28, 2026

0.4.0

Feb 22, 2026

0.3.7

Feb 20, 2026

0.3.6

Feb 19, 2026

0.3.5

Feb 18, 2026

0.3.4

Feb 17, 2026

0.3.0.20260207103851

Feb 7, 2026

0.2.0

Feb 1, 2026

0.1.2

Jan 31, 2026

0.1.1

Jan 31, 2026

0.1.0

Jan 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grilly-0.5.4.tar.gz (29.7 MB view details)

Uploaded Mar 20, 2026 Source

File details

Details for the file grilly-0.5.4.tar.gz.

File metadata

Download URL: grilly-0.5.4.tar.gz
Upload date: Mar 20, 2026
Size: 29.7 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for grilly-0.5.4.tar.gz
Algorithm	Hash digest
SHA256	`461b91374cf0770191595733e20f1fb0aa2312a9230d61a0c9a8ba53003aca20`
MD5	`1d4bb80f10a8085f07c259c4de3ba836`
BLAKE2b-256	`46b0284556c9f1fd7dc5c566cc74ac2e6a151082a1b1d1e56d6d5a2e5ed594d5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for grilly-0.5.4.tar.gz:

Publisher: publish.yml on Grillcheese-AI/grilly

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: grilly-0.5.4.tar.gz
- Subject digest: 461b91374cf0770191595733e20f1fb0aa2312a9230d61a0c9a8ba53003aca20
- Sigstore transparency entry: 1139893668
- Sigstore integration time: Mar 20, 2026
Source repository:
- Permalink: Grillcheese-AI/grilly@2036ff4759eb5c73a890a79d155a0ad85ae66daa
- Branch / Tag: refs/tags/v0.5.4
- Owner: https://github.com/Grillcheese-AI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2036ff4759eb5c73a890a79d155a0ad85ae66daa
- Trigger Event: release

grilly 0.5.4

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Grilly

Installation

Requirements

Quick Start

Autograd

Functional API

Architecture

What's New in 0.5.0 "GPU-First"

Features

Layers

Spiking Neural Networks

Optimizers

Ecosystem

Testing

Environment Variables

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

Provenance