Skip to main content

GPU-accelerated neural network operations using Vulkan compute shaders

Project description

Grilly

CI PyPI

GPU-accelerated neural network framework using Vulkan compute shaders. Supports AMD, NVIDIA, and Intel GPUs.

Documentation: https://grilly.readthedocs.io/

Release Status

  • Current release line: v0.3.1
  • Package name: grilly
  • Python support: >=3.12
  • Release channel: PyPI

Versioning is automated via setuptools-scm from git tags (e.g. v0.3.10.3.1).

Features

Neural Network Operations

  • Feedforward Networks: Linear layers, activations (ReLU, GELU, SiLU, SoftMax, SwiGLU, RoSwish, GCU)
  • Convolutional Networks: Conv2D, MaxPool2D, AvgPool2D, BatchNorm2D (forward and backward)
  • Recurrent Networks: LSTM cells
  • Attention Mechanisms: Flash Attention 2, multi-head attention, RoPE, prosody modulation
  • Normalization: LayerNorm, RMSNorm, BatchNorm
  • Activations: GELU, SiLU, ReLU, SoftMax, SoftPlus, SwiGLU, GEGLU, ReGLU, RoSwish, GCU
  • Fused Operations: Linear+activation fusion, QKV projection, layer normalization+linear

Spiking Neural Networks

  • Neuron Models: LIF (Leaky Integrate-and-Fire), GIF (Generalized Integrate-and-Fire)
  • Learning: STDP (Spike-Timing-Dependent Plasticity), Hebbian learning
  • Synaptic Dynamics: Forward propagation, STDP traces, weight updates
  • Bridges: Continuous-to-spike, spike-to-continuous conversion
  • Operations: SNN matmul, softmax, readout, expert readout

Memory & Retrieval

  • Memory Operations: Read, write, context aggregation
  • Memory Injection: Concatenation, gating, residual connections
  • Capsule Networks: Capsule projection, dentate gyrus sparse expansion
  • FAISS Integration: Distance computation, top-k selection, IVF filtering, quantization, k-means

Learning Algorithms

  • Optimization: Adam, natural gradients, Fisher information matrix
  • Continual Learning: EWC (Elastic Weight Consolidation), Fisher penalties
  • Adaptive Filtering: NLMS (Normalized Least Mean Squares), ensemble, prediction
  • Regularization: Dropout, whitening transforms

Specialized Operations

  • Place & Time Cells: Spatial encoding, temporal encoding, theta-gamma oscillations
  • FFT: Bit-reversal, butterfly operations, magnitude, power spectrum
  • Domain Adaptation: Domain classification, routing, expert combination
  • Embeddings: Lookup, position encoding, attention, FFN, pooling, normalization
  • Loss Functions: Cross-entropy, BCE, contrastive loss
  • Semantic Encoding: Affect MLP, affective processing

Transformer Support

  • Architecture-Specific Optimizations: BERT, GPT, T5, RoBERTa, DistilBERT, MPNet, XLM-RoBERTa, ALBERT
  • HuggingFace Bridge: Load pre-trained models without PyTorch runtime
  • Model Components: Multi-head attention, positional encoding, layer normalization
  • Fine-Tuning: LoRA (Low-Rank Adaptation), gradient checkpointing

LoRA Fine-Tuning

  • Parameter-efficient fine-tuning for transformers
  • Backward pass support for LoRA layers
  • Memory-efficient training on 12GB VRAM

Installation

From PyPI

pip install grilly

From Source

git clone https://github.com/grillcheese-ai/grilly.git
cd grilly
make install

# Or with development dependencies
make install-dev

# Or manually
pip install -e .

Requirements

  • Python >= 3.12
  • Vulkan drivers
  • NumPy
  • Supported GPUs: AMD (tested on RX 6750 XT), NVIDIA, Intel Arc

Quick Start

import grilly
import numpy as np

# Initialize compute backend
backend = grilly.Compute()

# Spiking neural network example
input_current = np.random.randn(1000).astype(np.float32)
membrane = np.zeros(1000, dtype=np.float32)
refractory = np.zeros(1000, dtype=np.float32)

membrane, refractory, spikes = backend.snn.lif_step(
    input_current, membrane, refractory,
    dt=0.001, tau_mem=20.0, v_thresh=1.0
)

# Feedforward network example
x = np.random.randn(32, 384).astype(np.float32)
weight = np.random.randn(384, 128).astype(np.float32)
bias = np.zeros(128, dtype=np.float32)

output = backend.fnn.linear(x, weight, bias)
activated = backend.fnn.swiglu(output)

# Flash Attention 2
q = np.random.randn(32, 8, 64, 64).astype(np.float32)  # (batch, heads, seq, dim)
k = np.random.randn(32, 8, 64, 64).astype(np.float32)
v = np.random.randn(32, 8, 64, 64).astype(np.float32)

attention_out = backend.attention.flash_attention2(q, k, v)

# FAISS similarity search
query = np.random.randn(1, 384).astype(np.float32)
database = np.random.randn(10000, 384).astype(np.float32)

distances = backend.faiss.compute_distances(query, database)
top_k_distances, top_k_indices = backend.faiss.topk(distances, k=10)

API Reference

Core Interfaces

  • grilly.Compute() - Main compute backend (alias for VulkanCompute)
  • grilly.SNNCompute() - High-level spiking neural network interface
  • grilly.Learning() - Learning algorithms (EWC, NLMS, etc.)

Backend Namespaces

  • backend.snn.* - Spiking neural network operations
  • backend.fnn.* - Feedforward network operations
  • backend.attention.* - Attention mechanisms
  • backend.memory.* - Memory operations
  • backend.faiss.* - Vector similarity search
  • backend.learning.* - Learning algorithms
  • backend.cells.* - Place and time cells

Shader Statistics

  • Total GLSL shaders: 137
  • Compiled SPIR-V shaders: 138
  • Categories: 12+ operation types

Compiling Shaders

Shaders are pre-compiled and included. To recompile:

# Compile all shaders (cross-platform)
make compile-shaders

# Verify compilation
make verify-shaders

# Or manually:
# Windows: .\scripts\compile_all_shaders.ps1
# Linux/Mac: ./compile_shaders.sh

# Single shader
glslc shader.glsl -o spv/shader.spv

GPU Selection

# Set GPU index (if multiple GPUs)
export VK_GPU_INDEX=0

# Enable debug logging
export GRILLY_DEBUG=1

# Allow CPU fallback
export ALLOW_CPU_VULKAN=1

Testing

# All tests (requires Vulkan)
make test

# CPU-only tests (no GPU required - for CI)
make test-cpu

# GPU tests only
make test-gpu

# With coverage report
make test-coverage

# Or use pytest directly
pytest tests/ -v                    # all tests
pytest tests/ -m "not gpu" -v       # CPU-only
pytest tests/ -m "gpu" -v          # GPU-only

Architecture

Grilly uses Vulkan compute shaders for cross-platform GPU acceleration. Each operation is implemented as a GLSL compute shader compiled to SPIR-V bytecode.

Design Principles

  • Pure Vulkan backend (no CUDA dependency)
  • Hardware-agnostic (AMD, NVIDIA, Intel)
  • Zero-copy GPU memory operations
  • Minimal CPU-GPU transfers
  • CPU fallback for unsupported operations

Performance

Tested on AMD RX 6750 XT (12GB VRAM):

  • LIF neuron simulation: 1M neurons at >1000 FPS
  • Flash Attention 2: 32 batch, 8 heads, 512 seq length at ~50ms
  • FAISS top-k: 10K vectors, 384D, k=10 at ~5ms

Examples

See examples/ directory for detailed usage:

  • Transformer fine-tuning with LoRA
  • Spiking neural network training
  • FAISS similarity search
  • Continual learning with EWC

Development

Quick Start

# Clone and setup
git clone https://github.com/grillcheese-ai/grilly.git
cd grilly

# Install with dev dependencies
make install-dev

# Run tests
make test

# Format code
make format

# Run linters
make lint

# Build package
make build

Project Structure

grilly/
├── .github/workflows/  # CI (lint, test, build) and CD (PyPI publish)
├── backend/            # Vulkan backend implementation
├── mcp-servers/        # MCP servers for AI coders
│   ├── grilly/         # TypeScript MCP server (grilly_docs, grilly_example, etc.)
│   └── elephant-coder/ # Codebase memory (Python)
├── nn/                 # High-level neural network modules
├── shaders/            # GLSL compute shaders
│   └── spv/            # Compiled SPIR-V bytecode
├── tests/              # Test suite
├── utils/              # HuggingFace bridge, utilities
└── Makefile            # Build automation

MCP Server for AI Coders

The grilly MCP server (mcp-servers/grilly/) helps AI assistants use Grilly:

  • grilly_docs — API docs (overview, quickstart, snn, fnn, attention, faiss)
  • grilly_example — Example code snippets
  • grilly_list_ops — List backend operations
  • grilly_run_python — Execute Python snippets
cd mcp-servers/grilly && npm install && npm run build

Makefile Commands

Run make help to see all available commands:

  • make install - Install package
  • make test - Run tests
  • make compile-shaders - Compile shaders
  • make build - Build distribution
  • make format - Format code
  • make lint - Run linters
  • make clean - Clean build artifacts

CI/CD

  • CI (on push/PR): Lint (ruff), test (CPU-only), build
  • CD (on release): Build, publish to PyPI via Trusted Publishing

Releases are published automatically when you create a GitHub Release with a tag (e.g. v0.3.1). No API token needed — uses PyPI Trusted Publishing (OIDC).

One-time setup: Trusted Publisher on PyPI

  1. Go to pypi.org/manage/projectsManagePublishing
  2. Add a GitHub publisher:
    • Owner: grillcheese-ai
    • Repository: grilly
    • Workflow name: publish.yml

Manual publish (local)

make build
twine upload dist/*
# Requires PyPI API token (create at pypi.org/manage/account/token/)

For Test PyPI: twine upload --repository testpypi dist/*

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new features
  4. Run make check to verify
  5. Submit a pull request

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grilly-0.3.5.tar.gz (1.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

grilly-0.3.5-py3-none-any.whl (1.0 MB view details)

Uploaded Python 3

File details

Details for the file grilly-0.3.5.tar.gz.

File metadata

  • Download URL: grilly-0.3.5.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for grilly-0.3.5.tar.gz
Algorithm Hash digest
SHA256 a27c9d9d469f4f3a5612c647859b42d15c589361e31a907418a1bb7884b382b8
MD5 106f272ccae75d2d8ba08493c2e0a2e0
BLAKE2b-256 f23dc05c98fbb3a44ae9a226a3904ab8a17ad99c5d0ef5e7d58e0fcbae7d2354

See more details on using hashes here.

Provenance

The following attestation bundles were made for grilly-0.3.5.tar.gz:

Publisher: publish.yml on Grillcheese-AI/grilly

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file grilly-0.3.5-py3-none-any.whl.

File metadata

  • Download URL: grilly-0.3.5-py3-none-any.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for grilly-0.3.5-py3-none-any.whl
Algorithm Hash digest
SHA256 2f3834901e0a32351e9287101d2a440a20defd1a2f3d2d07cac5d5dcbba91630
MD5 ad7883aab22ba2695e42308951f3d109
BLAKE2b-256 7e5a7ab857bbb176a4e862b374f141c47593e866d84b7713c5f96ea70ace6aa6

See more details on using hashes here.

Provenance

The following attestation bundles were made for grilly-0.3.5-py3-none-any.whl:

Publisher: publish.yml on Grillcheese-AI/grilly

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page