GPU-accelerated neural network operations using Vulkan compute shaders
Project description
Grilly
Deep learning, well done.
Alpha software. Not production-ready. APIs may change. We welcome early adopters and feedback.
GPU-accelerated neural network framework using Vulkan compute shaders. No CUDA required. Supports AMD, NVIDIA, and Intel GPUs.
Documentation: https://grilly.readthedocs.io/
Release Status
- Current release line: v0.3.5
- Package name:
grilly - Python support:
>=3.12 - Release channel: PyPI
Versioning is automated via setuptools-scm from git tags (e.g. v0.3.1 → 0.3.1).
Features
Neural Network Operations
- Feedforward Networks: Linear layers, activations (ReLU, GELU, SiLU, SoftMax, SwiGLU, RoSwish, GCU)
- Convolutional Networks: Conv2D, MaxPool2D, AvgPool2D, BatchNorm2D (forward and backward)
- Recurrent Networks: LSTM cells
- Attention Mechanisms: Flash Attention 2, multi-head attention, RoPE, prosody modulation
- Normalization: LayerNorm, RMSNorm, BatchNorm
- Activations: GELU, SiLU, ReLU, SoftMax, SoftPlus, SwiGLU, GEGLU, ReGLU, RoSwish, GCU
- Fused Operations: Linear+activation fusion, QKV projection, layer normalization+linear
Spiking Neural Networks
- Neuron Models: LIF (Leaky Integrate-and-Fire), GIF (Generalized Integrate-and-Fire)
- Learning: STDP (Spike-Timing-Dependent Plasticity), Hebbian learning
- Synaptic Dynamics: Forward propagation, STDP traces, weight updates
- Bridges: Continuous-to-spike, spike-to-continuous conversion
- Operations: SNN matmul, softmax, readout, expert readout
Memory & Retrieval
- Memory Operations: Read, write, context aggregation
- Memory Injection: Concatenation, gating, residual connections
- Capsule Networks: Capsule projection, dentate gyrus sparse expansion
- FAISS Integration: Distance computation, top-k selection, IVF filtering, quantization, k-means
Learning Algorithms
- Optimization: Adam, natural gradients, Fisher information matrix
- Continual Learning: EWC (Elastic Weight Consolidation), Fisher penalties
- Adaptive Filtering: NLMS (Normalized Least Mean Squares), ensemble, prediction
- Regularization: Dropout, whitening transforms
Specialized Operations
- Place & Time Cells: Spatial encoding, temporal encoding, theta-gamma oscillations
- FFT: Bit-reversal, butterfly operations, magnitude, power spectrum
- Domain Adaptation: Domain classification, routing, expert combination
- Embeddings: Lookup, position encoding, attention, FFN, pooling, normalization
- Loss Functions: Cross-entropy, BCE, contrastive loss
- Semantic Encoding: Affect MLP, affective processing
Transformer Support
- Architecture-Specific Optimizations: BERT, GPT, T5, RoBERTa, DistilBERT, MPNet, XLM-RoBERTa, ALBERT
- HuggingFace Bridge: Load pre-trained models without PyTorch runtime
- Model Components: Multi-head attention, positional encoding, layer normalization
- Fine-Tuning: LoRA (Low-Rank Adaptation), gradient checkpointing
LoRA Fine-Tuning
- Parameter-efficient fine-tuning for transformers
- Backward pass support for LoRA layers
- Memory-efficient training on 12GB VRAM
Installation
From PyPI
pip install grilly
From Source
git clone https://github.com/grillcheese-ai/grilly.git
cd grilly
make install
# Or with development dependencies
make install-dev
# Or manually
pip install -e .
Requirements
- Python >= 3.12
- Vulkan drivers
- NumPy
- Supported GPUs: AMD (tested on RX 6750 XT), NVIDIA, Intel Arc
Quick Start
import grilly
import numpy as np
# Initialize compute backend
backend = grilly.Compute()
# Spiking neural network example
input_current = np.random.randn(1000).astype(np.float32)
membrane = np.zeros(1000, dtype=np.float32)
refractory = np.zeros(1000, dtype=np.float32)
membrane, refractory, spikes = backend.snn.lif_step(
input_current, membrane, refractory,
dt=0.001, tau_mem=20.0, v_thresh=1.0
)
# Feedforward network example
x = np.random.randn(32, 384).astype(np.float32)
weight = np.random.randn(384, 128).astype(np.float32)
bias = np.zeros(128, dtype=np.float32)
output = backend.fnn.linear(x, weight, bias)
activated = backend.fnn.swiglu(output)
# Flash Attention 2
q = np.random.randn(32, 8, 64, 64).astype(np.float32) # (batch, heads, seq, dim)
k = np.random.randn(32, 8, 64, 64).astype(np.float32)
v = np.random.randn(32, 8, 64, 64).astype(np.float32)
attention_out = backend.attention.flash_attention2(q, k, v)
# FAISS similarity search
query = np.random.randn(1, 384).astype(np.float32)
database = np.random.randn(10000, 384).astype(np.float32)
distances = backend.faiss.compute_distances(query, database)
top_k_distances, top_k_indices = backend.faiss.topk(distances, k=10)
API Reference
Core Interfaces
grilly.Compute()- Main compute backend (alias for VulkanCompute)grilly.SNNCompute()- High-level spiking neural network interfacegrilly.Learning()- Learning algorithms (EWC, NLMS, etc.)
Backend Namespaces
backend.snn.*- Spiking neural network operationsbackend.fnn.*- Feedforward network operationsbackend.attention.*- Attention mechanismsbackend.memory.*- Memory operationsbackend.faiss.*- Vector similarity searchbackend.learning.*- Learning algorithmsbackend.cells.*- Place and time cells
Shader Statistics
- Total GLSL shaders: 154
- Compiled SPIR-V shaders: 154
- Categories: 12+ operation types
Compiling Shaders
Shaders are pre-compiled and included. To recompile:
# Compile all shaders (cross-platform)
make compile-shaders
# Verify compilation
make verify-shaders
# Or manually:
# Windows: .\scripts\compile_all_shaders.ps1
# Linux/Mac: ./compile_shaders.sh
# Single shader
glslc shader.glsl -o spv/shader.spv
GPU Selection
# Set GPU index (if multiple GPUs)
export VK_GPU_INDEX=0
# Enable debug logging
export GRILLY_DEBUG=1
# Allow CPU fallback
export ALLOW_CPU_VULKAN=1
Testing
# All tests (requires Vulkan)
make test
# CPU-only tests (no GPU required - for CI)
make test-cpu
# GPU tests only
make test-gpu
# With coverage report
make test-coverage
# Or use pytest directly
pytest tests/ -v # all tests
pytest tests/ -m "not gpu" -v # CPU-only
pytest tests/ -m "gpu" -v # GPU-only
Architecture
Grilly uses Vulkan compute shaders for cross-platform GPU acceleration. Each operation is implemented as a GLSL compute shader compiled to SPIR-V bytecode.
Design Principles
- Pure Vulkan backend (no CUDA dependency)
- Hardware-agnostic (AMD, NVIDIA, Intel)
- Zero-copy GPU memory operations
- Minimal CPU-GPU transfers
- CPU fallback for unsupported operations
Performance
Tested on AMD RX 6750 XT (12GB VRAM):
- LIF neuron simulation: 1M neurons at >1000 FPS
- Flash Attention 2: 32 batch, 8 heads, 512 seq length at ~50ms
- FAISS top-k: 10K vectors, 384D, k=10 at ~5ms
Built for GrillCheese AI
Grilly powers GrillCheese AI, a neuromorphic language system that replaces pure transformer stacks with brain-inspired modules — hippocampal memory, thalamic routing, amygdala affect, and Oja-rule online plasticity — all running on Vulkan compute. The research explores four hypotheses:
- H1 (Architecture): Modular neuromorphic design can match transformers while enabling episodic memory, continual learning, and affect-driven routing.
- H2 (Efficiency): Vulkan-accelerated SSM training can reach >10,000 tok/s on a single consumer GPU — no CUDA or cloud required.
- H3 (Memory): Capsule encoding (768D to 32D) with dentate gyrus sparse expansion preserves information for hippocampal retrieval via Matryoshka representation learning.
- H4 (Plasticity): Online Oja-rule weight updates enable continual adaptation without catastrophic forgetting.
Grilly v1.0 will ship alongside the GrillCheese AI public release.
Examples
A minimal forward + backward pass:
import grilly.nn as nn
layer = nn.Linear(128, 10)
x = nn.randn(32, 128, requires_grad=True)
logits = x @ nn.Variable(layer.weight.T) + nn.Variable(layer.bias)
loss = logits.sum()
loss.backward()
print(x.grad.shape) # (32, 128)
See examples/ for more:
hello_grilly.py— Autograd forward + backwardtrain_mlp.py— Full training loop with AdamW and cross-entropybenchmark_gemm.py— GPU vs CPU GEMM throughput table- 14 experimental examples (VSA, MoE, capsules, cognitive control, and more)
Development
Quick Start
# Clone and setup
git clone https://github.com/grillcheese-ai/grilly.git
cd grilly
# Install with dev dependencies
make install-dev
# Run tests
make test
# Format code
make format
# Run linters
make lint
# Build package
make build
Project Structure
grilly/
├── .github/workflows/ # CI (lint, test, build) and CD (PyPI publish)
├── backend/ # Vulkan backend implementation
├── mcp-servers/ # MCP servers for AI coders
│ ├── grilly/ # TypeScript MCP server (grilly_docs, grilly_example, etc.)
│ └── elephant-coder/ # Codebase memory (Python)
├── nn/ # High-level neural network modules
├── shaders/ # GLSL compute shaders
│ └── spv/ # Compiled SPIR-V bytecode
├── tests/ # Test suite
├── utils/ # HuggingFace bridge, utilities
└── Makefile # Build automation
MCP Server for AI Coders
The grilly MCP server (mcp-servers/grilly/) helps AI assistants use Grilly:
grilly_docs— API docs (overview, quickstart, snn, fnn, attention, faiss)grilly_example— Example code snippetsgrilly_list_ops— List backend operationsgrilly_run_python— Execute Python snippets
cd mcp-servers/grilly && npm install && npm run build
Makefile Commands
Run make help to see all available commands:
make install- Install packagemake test- Run testsmake compile-shaders- Compile shadersmake build- Build distributionmake format- Format codemake lint- Run lintersmake clean- Clean build artifacts
CI/CD
- CI (on push/PR): Lint (ruff), test (CPU-only), build
- CD (on release): Build, publish to PyPI via Trusted Publishing
Releases are published automatically when you create a GitHub Release with a tag (e.g. v0.3.1). No API token needed — uses PyPI Trusted Publishing (OIDC).
One-time setup: Trusted Publisher on PyPI
- Go to pypi.org/manage/projects → Manage → Publishing
- Add a GitHub publisher:
- Owner:
grillcheese-ai - Repository:
grilly - Workflow name:
publish.yml
- Owner:
Manual publish (local)
make build
twine upload dist/*
# Requires PyPI API token (create at pypi.org/manage/account/token/)
For Test PyPI: twine upload --repository testpypi dist/*
Contributing
- Fork the repository
- Create a feature branch
- Add tests for new features
- Run
make checkto verify - Submit a pull request
Roadmap and Community
Open an issue. Tell us what to implement or optimize.
Current priorities:
- Training throughput (GEMM tiling, fused backward shaders)
- Backward pass coverage for all operations
- INT8/INT4 quantization kernels
- Documentation and tutorials
License
MIT License - see LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file grilly-0.3.7.tar.gz.
File metadata
- Download URL: grilly-0.3.7.tar.gz
- Upload date:
- Size: 7.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
87cafb715a498c2326a91b8f4de9b8fc2598080a6c598b971d7073873cbb654e
|
|
| MD5 |
bd43408b6d39617652ed2ee25e438299
|
|
| BLAKE2b-256 |
d35695392a9da8676d7a45d492ddd23ea3af065873529646515ffbff47d2880f
|
Provenance
The following attestation bundles were made for grilly-0.3.7.tar.gz:
Publisher:
publish.yml on Grillcheese-AI/grilly
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
grilly-0.3.7.tar.gz -
Subject digest:
87cafb715a498c2326a91b8f4de9b8fc2598080a6c598b971d7073873cbb654e - Sigstore transparency entry: 974680205
- Sigstore integration time:
-
Permalink:
Grillcheese-AI/grilly@6eed15e81bd661eb7815f8e8cb4b211f88f6cdf3 -
Branch / Tag:
refs/tags/v0.3.7 - Owner: https://github.com/Grillcheese-AI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6eed15e81bd661eb7815f8e8cb4b211f88f6cdf3 -
Trigger Event:
release
-
Statement type:
File details
Details for the file grilly-0.3.7-py3-none-any.whl.
File metadata
- Download URL: grilly-0.3.7-py3-none-any.whl
- Upload date:
- Size: 1.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3410c1d601be625674cf9b091dd2a1ebf75104ca38b21202e6708172ac2011b4
|
|
| MD5 |
dfc592fd76d6048e98cd58b2ee979ac0
|
|
| BLAKE2b-256 |
785c4cf616549a264bd58182cf3f88a339addf244d31e61f847d57018ab94ebd
|
Provenance
The following attestation bundles were made for grilly-0.3.7-py3-none-any.whl:
Publisher:
publish.yml on Grillcheese-AI/grilly
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
grilly-0.3.7-py3-none-any.whl -
Subject digest:
3410c1d601be625674cf9b091dd2a1ebf75104ca38b21202e6708172ac2011b4 - Sigstore transparency entry: 974680216
- Sigstore integration time:
-
Permalink:
Grillcheese-AI/grilly@6eed15e81bd661eb7815f8e8cb4b211f88f6cdf3 -
Branch / Tag:
refs/tags/v0.3.7 - Owner: https://github.com/Grillcheese-AI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6eed15e81bd661eb7815f8e8cb4b211f88f6cdf3 -
Trigger Event:
release
-
Statement type: