GPU-accelerated neural network operations using Vulkan compute shaders

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

grillcheese

These details have not been verified by PyPI

Project links

Project description

Grilly

Deep learning, well done.

GPU-accelerated neural network framework built on Vulkan compute shaders. Runs on any GPU — AMD, NVIDIA, Intel — no CUDA required. Provides a PyTorch-like nn.Module API backed by 161 SPIR-V shaders and a native C++ dispatch layer.

Alpha software. APIs may change between minor versions. We welcome early adopters and feedback.

Howto Guides: howtos/ (self-contained HTML tutorials)

Quick Start

import numpy as np
from grilly import nn

# Define a model — same patterns as PyTorch
model = nn.Sequential(
    nn.Linear(784, 256),
    nn.ReLU(),
    nn.Linear(256, 10),
)

# Forward pass
x = np.random.randn(32, 784).astype(np.float32)
logits = model(x)
print(logits.shape)  # (32, 10)

# Loss + backward + optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = nn.optim.AdamW(model.parameters(), lr=1e-3)

targets = np.random.randint(0, 10, (32,))
loss = loss_fn(logits, targets)
grad = loss_fn.backward(np.ones_like(loss), logits, targets)

model.zero_grad()
model.backward(grad)
optimizer.step()

Autograd

from grilly import nn

x = nn.Variable(nn.randn(32, 128), requires_grad=True)
layer = nn.Linear(128, 10)

logits = x @ nn.Variable(layer.weight.T) + nn.Variable(layer.bias)
loss = logits.sum()
loss.backward()

print(x.grad.shape)  # (32, 128)

Installation

From PyPI

pip install grilly

From Source (with C++ backend)

The C++ backend (grilly_core) is required — it provides the native Vulkan dispatch layer for all GPU operations.

git clone https://github.com/grillcheese-ai/grilly.git
cd grilly
pip install -e ".[dev]"

# Build the C++ backend
cmake -B build -DPYBIND11_FINDPYTHON=ON
cmake --build build --config Release
cp build/Release/grilly_core.*.pyd .   # Windows
# cp build/grilly_core.*.so .          # Linux

Verify:

python -c "import grilly_core; print('C++ backend OK')"
python -c "import grilly; b = grilly.Compute(); print('GPU:', b.device_name)"

See INSTALL.md for full setup (Vulkan SDK, Ubuntu, CI environments, troubleshooting).

Requirements

Requirement	Minimum	Recommended
Python	3.12+	3.12
GPU VRAM	8 GB	12 GB+
System RAM	32 GB	64 GB
Vulkan	1.2+ drivers	Latest drivers

Supported GPUs: AMD (RX 5000+), NVIDIA (GTX 1060+), Intel (Arc A-series).

Features

PyTorch-like nn.Module API

Standard layers with GPU-accelerated forward and backward passes:

Category	Modules
Linear	`Linear`, `Embedding`, `Dropout`
Convolution	`Conv1d`, `Conv2d`
Recurrent	`LSTM`, `LSTMCell`, `GRU`, `GRUCell`
Pooling	`MaxPool2d`, `AvgPool2d`, `AdaptiveMaxPool2d`, `AdaptiveAvgPool2d`
Normalization	`LayerNorm`, `RMSNorm`, `BatchNorm1d`, `BatchNorm2d`
Activations	`ReLU`, `GELU`, `SiLU`, `SwiGLU`, `GCU`, `RoSwish`, `Softmax`, `Softplus`
Attention	`MultiheadAttention`, `FlashAttention2`, `RoPE`
Loss	`MSELoss`, `CrossEntropyLoss`, `BCELoss`
Containers	`Sequential`, `Residual`

Spiking Neural Networks

Full SNN framework with surrogate gradient training:

Neuron models: IFNode, LIFNode, ParametricLIFNode
Surrogate gradients: ATan, Sigmoid, FastSigmoid
Temporal containers: SeqToANNContainer, MultiStepContainer
Normalization: BatchNormThroughTime, TemporalEffectiveBatchNorm, NeuNorm
Synapses: STPSynapse, DualTimescaleSynapse, SynapseFilter
Attention: SpikingSelfAttention, TemporalWiseAttention, QKAttention
ANN-to-SNN conversion: Converter, VoltageScaler

Multimodal Fusion

PerceiverIO — Modality-agnostic input compression
PerceiverResampler — Flamingo-style visual token resampling
FlamingoFusion — Cross-attention VLM fusion
CrossModalAttentionFusion — Bidirectional cross-modal attention
ImageBindFusion — Joint embedding with contrastive loss
BottleneckFusion — Multimodal Bottleneck Transformer
VisionLanguageModel — Complete VLM with visual conditioning

Transformer Components

Flash Attention 2 (tiled, O(seq) memory)
Rotary Position Embeddings (RoPE)
LoRA fine-tuning (LoRALinear, LoRAAttention, LoRAModel)
Transformer encoder/decoder layers
Fused operations: SwiGLU FFN, RMSNorm+Linear, QKV projection

Inference Optimizations

Fused RMSNorm shader (Llama, Gemma)
Grouped Query Attention (GQA) decode against KV-cache
INT8 GEMM (weight-only, FP32 accumulation)
4-bit block quantization (per-block scale + zero-point)

Optimizers

AdamW, Adam, SGD, NLMS, NaturalGradient, AutoHypergradientAdamW (OSGM-style auto LR tuning), plus LR schedulers (StepLR, CosineAnnealingLR, ReduceLROnPlateau).

Functional API

Stateless functions mirroring torch.nn.functional:

import grilly.functional as F

F.linear(x, weight, bias)
F.relu(x)
F.softmax(x, dim=-1)
F.cross_entropy(logits, targets)
F.flash_attention2(q, k, v)

Autograd

Full computation graph with automatic differentiation:

from grilly.nn import Variable, no_grad, tensor

x = Variable(tensor([1.0, 2.0, 3.0]), requires_grad=True)
y = (x * x).sum()
y.backward()
print(x.grad)  # [2.0, 4.0, 6.0]

C++ Backend (grilly_core)

The native C++ extension (grilly_core) wraps all Vulkan compute dispatch via pybind11. It provides 16 operation modules:

Op	Description
`linear`	Dense matrix multiply (GEMM)
`conv`	2D convolution (im2col + GEMM)
`activations`	ReLU, GELU, SiLU, Tanh
`layernorm`	Layer normalization
`rmsnorm`	Root mean square normalization
`batchnorm`	Batch normalization (2D)
`attention`	Flash Attention 2
`attention_ops`	RoPE, KV-cache ops
`embedding`	Token + position embeddings
`pooling`	MaxPool2d, AvgPool2d
`loss`	Cross-entropy, MSE, BCE
`snn`	LIF/IF neuron step kernels
`optimizer`	Adam, AdamW, SGD step kernels
`learning`	STDP, Hebbian, EWC
`kv_cache`	Paged KV-cache management
`swizzle`	Memory layout transforms

Build instructions: see INSTALL.md.

Ecosystem

Package	Description
optimum-grilly	HuggingFace Optimum backend — `from_pretrained` → Vulkan inference (Llama, Mistral, BERT, GPT-2)

pip install grilly optimum-grilly

Examples

See examples/ for runnable scripts:

hello_grilly.py — Autograd forward + backward
train_mlp.py — Full training loop with AdamW and cross-entropy
benchmark_gemm.py — GPU vs CPU GEMM throughput
classifier.py — Simple classifier example
13 experimental examples (VSA, MoE, capsules, cognitive control, and more)

Architecture

grilly/
├── backend/        # Vulkan GPU dispatch (core.py, compute.py, pipelines.py, autograd_core.py)
├── cpp/            # C++ pybind11 extension (grilly_core) — 16 native ops
├── nn/             # PyTorch-like nn.Module layers, SNN framework, multimodal fusion
├── functional/     # Stateless F.* API (mirrors torch.nn.functional)
├── optim/          # Optimizers (AdamW, Adam, SGD, NLMS, NaturalGradient, Hypergradient)
├── utils/          # DataLoader, Dataset, HuggingFaceBridge, VulkanTensor, checkpointing
├── shaders/        # 161 GLSL compute shaders
│   └── spv/        # Compiled SPIR-V bytecode
├── experimental/   # Unstable: VSA, MoE routing, temporal reasoning, cognitive controller
├── howtos/         # 8 self-contained HTML tutorials
├── examples/       # Runnable example scripts
└── tests/          # Test suite (1000+ tests)

Design Principles

Pure Vulkan — no CUDA, no vendor lock-in
Hardware-agnostic — AMD, NVIDIA, Intel on the same codebase
C++ dispatch layer — pybind11 extension for low-overhead GPU calls
Zero-copy GPU memory — VulkanTensor keeps data GPU-resident between ops
All data is np.float32 — numpy arrays in, numpy arrays out

Environment Variables

Variable	Description	Default
`VK_GPU_INDEX`	Select GPU by index (multi-GPU systems)	`0`
`GRILLY_DEBUG`	Enable debug logging (`1` = on)	off
`ALLOW_CPU_VULKAN`	Allow Mesa llvmpipe software Vulkan (CI)	off

Testing

# All tests (requires Vulkan)
uv run pytest tests/ -v

# CPU-only (no GPU required)
uv run pytest tests/ -m "not gpu" -v

# With coverage
uv run pytest tests/ --cov=. --cov-report=term

# Single test
pytest tests/test_snn.py -k "test_lif"

CI/CD

CI (on push/PR): Lint (ruff, black), test (CPU-only on Mesa llvmpipe), build
CD (on GitHub Release): Build and publish to PyPI via Trusted Publishing (OIDC, no API tokens)

Contributing

Fork the repository
Create a feature branch
Add tests for new features
Run ruff check . and pytest tests/ -v
Submit a pull request

License

MIT License — see LICENSE for details.

Project details

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

grillcheese

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.0

Apr 10, 2026

0.6.1

Mar 30, 2026

0.5.6

Mar 20, 2026

0.5.5

Mar 20, 2026

0.5.4

Mar 20, 2026

0.5.3

Mar 18, 2026

0.5.2 yanked

Mar 18, 2026

0.5.0

Mar 18, 2026

This version

0.4.6

Mar 15, 2026

0.4.5

Feb 28, 2026

0.4.0

Feb 22, 2026

0.3.7

Feb 20, 2026

0.3.6

Feb 19, 2026

0.3.5

Feb 18, 2026

0.3.4

Feb 17, 2026

0.3.0.20260207103851

Feb 7, 2026

0.2.0

Feb 1, 2026

0.1.2

Jan 31, 2026

0.1.1

Jan 31, 2026

0.1.0

Jan 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grilly-0.4.6.tar.gz (7.8 MB view details)

Uploaded Mar 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

grilly-0.4.6-py3-none-any.whl (1.2 MB view details)

Uploaded Mar 15, 2026 Python 3

File details

Details for the file grilly-0.4.6.tar.gz.

File metadata

Download URL: grilly-0.4.6.tar.gz
Upload date: Mar 15, 2026
Size: 7.8 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for grilly-0.4.6.tar.gz
Algorithm	Hash digest
SHA256	`a5cb69945b2627c3ab145ecb8f01ad57827eb21451704cad57676b6e6644a735`
MD5	`4cee9582dd0403cba92b86617e17f66b`
BLAKE2b-256	`23b394eb6de6abc18df4ad728256cacbc501eed13735cd2ffa3a4d0c3873f120`

See more details on using hashes here.

Provenance

The following attestation bundles were made for grilly-0.4.6.tar.gz:

Publisher: publish.yml on Grillcheese-AI/grilly

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: grilly-0.4.6.tar.gz
- Subject digest: a5cb69945b2627c3ab145ecb8f01ad57827eb21451704cad57676b6e6644a735
- Sigstore transparency entry: 1108312237
- Sigstore integration time: Mar 15, 2026
Source repository:
- Permalink: Grillcheese-AI/grilly@c6b96631b4d8f1e544c2fe04249d7d8303364859
- Branch / Tag: refs/tags/0.4.6
- Owner: https://github.com/Grillcheese-AI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c6b96631b4d8f1e544c2fe04249d7d8303364859
- Trigger Event: release

File details

Details for the file grilly-0.4.6-py3-none-any.whl.

File metadata

Download URL: grilly-0.4.6-py3-none-any.whl
Upload date: Mar 15, 2026
Size: 1.2 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for grilly-0.4.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`63163ce3b78e5a3fa2275af20bee128ad2ec43d21e716a33322b935d66eeef8f`
MD5	`9b91b852cacf7f31828f3e3cefe6fb63`
BLAKE2b-256	`150f0daa48805de03cb9790bdd55404b883b49ae9d9849e1e4a528b01a4b423d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for grilly-0.4.6-py3-none-any.whl:

Publisher: publish.yml on Grillcheese-AI/grilly

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: grilly-0.4.6-py3-none-any.whl
- Subject digest: 63163ce3b78e5a3fa2275af20bee128ad2ec43d21e716a33322b935d66eeef8f
- Sigstore transparency entry: 1108312251
- Sigstore integration time: Mar 15, 2026
Source repository:
- Permalink: Grillcheese-AI/grilly@c6b96631b4d8f1e544c2fe04249d7d8303364859
- Branch / Tag: refs/tags/0.4.6
- Owner: https://github.com/Grillcheese-AI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c6b96631b4d8f1e544c2fe04249d7d8303364859
- Trigger Event: release

grilly 0.4.6

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Grilly

Quick Start

Autograd

Installation

From PyPI

From Source (with C++ backend)

Requirements

Features

PyTorch-like nn.Module API

Spiking Neural Networks

Multimodal Fusion

Transformer Components

Inference Optimizations

Optimizers

Functional API

Autograd

C++ Backend (grilly_core)

Ecosystem

Examples

Architecture

Design Principles

Environment Variables

Testing

CI/CD

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance