Skip to main content

Pythonic LLM inference on legacy GPUs using Vulkan

Project description

VulkanIlm 🚀🔥

GPU-Accelerated Local LLMs for Everyone (Vulkan + Ilm — "knowledge")

VulkanIlm is a Python-first wrapper and CLI around llama.cpp's Vulkan backend that brings fast local LLM inference to AMD, Intel, and NVIDIA GPUs — no CUDA required. Built for developers with legacy or non-NVIDIA hardware.


TL;DR

  • What: Python library + CLI to run LLMs locally using Vulkan GPU acceleration.
  • Why: Most acceleration tooling targets CUDA/NVIDIA — VulkanIlm opens up AMD & Intel users.
  • Quick result: Small models can run orders of magnitude faster on iGPUs; mid/large legacy GPUs get ~4–6× speedups vs CPU.

Key features

  • 🚀 Significant speedups vs CPU on legacy GPUs and iGPUs
  • 🎮 Broad GPU support: AMD, Intel, NVIDIA (via Vulkan)
  • 🐍 Python-first API + easy CLI tools
  • ⚡ Auto detection + GPU-specific optimizations
  • 📦 Auto build/install of llama.cpp Vulkan backend
  • 🔄 Real-time streaming token generation
  • ✅ Reproducible benchmark scripts in benchmarks/

Benchmarks (summary)

Benchmarks measured with Gemma-3n-E4B-it (6.9B) unless noted. Results depend on model quantization, GPU drivers, OS, and system load.

Hardware (OS) Model CPU time Vulkan (GPU) time Speedup
Dell E7250 (i7-5600U, integrated GPU) — Fedora 42 Workstation TinyLLaMA-1.1B-Chat (Q4_K_M) 121 s 3 s 33×
AMD RX 580 8GB — Ubuntu 22.04.5 LTS (Jammy) Gemma-3n-E4B-it (6.9B) 188.47 s 44.74 s 4.21×
Intel Arc A770 Gemma-3n-E4B-it (6.9B) ~120 s ~25 s ~4.8×
AMD RX 6600 Gemma-3n-E4B-it (6.9B) ~90 s ~18 s ~5.0×

iGPU notes

  • The Dell E7250 iGPU result shows older integrated GPUs can be very effective for smaller LLMs when using Vulkan.
  • Smaller models and appropriate quantizations are more iGPU-friendly. Driver/version differences significantly affect results.

Other tested (functional) models

  • DeepSeek-R1-Distill-Qwen-1.5B-unsloth-bnb-4bit — runs (not benchmarked).
  • LLaMA 3.1 8B — runs (not benchmarked).

ROCm / AMD notes

  • ROCm is not officially supported for gfx803 (RX 580).
  • Some community members try ROCm 5/6 workarounds on RX 580, but they are unstable/unsupported.
  • VulkanIlm offers a Vulkan-based path that avoids ROCm on legacy AMD cards.

Install

Quick start

git clone https://github.com/Talnz007/VulkanIlm.git
cd VulkanIlm
pip install -e .

Prerequisites

  • Python 3.9+
  • Vulkan-capable GPU (AMD RX 400+, Intel Arc/Xe, NVIDIA GTX 900+)
  • Vulkan drivers installed and working

Install Vulkan tools (if needed)

Ubuntu / Debian:

sudo apt update
sudo apt install vulkan-tools libvulkan-dev

Fedora / RHEL:

sudo dnf install vulkan-tools vulkan-devel

Verify:

vulkaninfo

Usage

CLI examples

# Auto-install llama.cpp with Vulkan support
vulkanilm install

# Check your GPU setup
vulkanilm vulkan-info

# Search and download models (if supported)
vulkanilm search "llama"
vulkanilm download microsoft/DialoGPT-medium

# Generate text
vulkanilm ask path/to/model.gguf --prompt "Explain quantum computing"

# Stream tokens in real-time
vulkanilm stream path/to/model.gguf "Tell me a story about AI"

# Run a benchmark
vulkanilm benchmark path/to/model.gguf --prompt "Benchmark prompt" --repeat 3

Python API (example)

from vulkan_ilm import Llama

# Load model (auto GPU optimization)
llm = Llama("path/to/model.gguf", gpu_layers=16)

# Synchronous generation
response = llm.ask("Explain the term 'ilm' in AI context.")
print(response)

# Streaming generation
for token in llm.stream_ask_real("Tell me about Vulkan API"):
    print(token, end='', flush=True)

Reproduce benchmarks (quick checklist)

  1. Use the exact model file & quantization referenced in /benchmarks (GGUF + quantization).
  2. Use the benchmark script in benchmarks/run_benchmark.sh.
  3. Record: driver version, OS version, CPU frequency governor, and system load.
  4. Run benchmarks multiple times (cold and warm cache) and average results.

Troubleshooting (Linux)

vulkanilm: command not found

  • Activate venv and reinstall:
python3 -m venv venv
source venv/bin/activate
pip install -e .
  • Or run via Poetry:
poetry run vulkanilm install

Could NOT find Vulkan (missing: glslc)

  • Install glslc (Vulkan SDK / vulkan-tools):
# Fedora
sudo dnf install glslc

# Ubuntu/Debian
sudo apt install vulkan-tools

Verify: glslc --version

Could NOT find CURL

  • Install libcurl dev:
# Fedora
sudo dnf install libcurl-devel

# Ubuntu/Debian
sudo apt install libcurl4-openssl-dev

Project structure

VulkanIlm/
├── vulkan_ilm/
│   ├── cli.py
│   ├── llama.py
│   ├── vulkan/
│   │   └── detector.py
│   ├── benchmark.py
│   ├── installer.py
│   └── streaming.py
├── benchmarks/             # benchmark scripts & data
├── pyproject.toml
└── README.md

Contributing

We welcome contributions! Useful areas:

  • GPU testing across drivers & OSes
  • Additional model formats & quant recipes
  • Memory & perf optimizations
  • Docs, reproducible benchmarks, and examples

See CONTRIBUTING.md for details. Look for good-first-issue tags.


The story behind the name

Ilm (علم) = knowledge / wisdom. Combined with Vulkan — “knowledge on fire”: making fast local AI accessible to everyone, regardless of GPU brand or budget. 🔥


License

MIT — see LICENSE for details.


Links & support


Built with passion by @Talnz007 — bringing fast, local AI to legacy GPUs everywhere.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vulkan_ilm-0.1.0.tar.gz (23.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vulkan_ilm-0.1.0-py3-none-any.whl (27.2 kB view details)

Uploaded Python 3

File details

Details for the file vulkan_ilm-0.1.0.tar.gz.

File metadata

  • Download URL: vulkan_ilm-0.1.0.tar.gz
  • Upload date:
  • Size: 23.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.13.7 Linux/6.16.8-200.fc42.x86_64

File hashes

Hashes for vulkan_ilm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 73604c02cfe95232deaa63d9ab4c01bc2b5307d6eaa1ed77efff284e0679dd8b
MD5 27c748766821c5861ed2b86b718d9062
BLAKE2b-256 ed043bb8b0c2a97137c97213e8cb06ed41943ba72c0905fde5e3a9c7019bccfb

See more details on using hashes here.

File details

Details for the file vulkan_ilm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: vulkan_ilm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 27.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.13.7 Linux/6.16.8-200.fc42.x86_64

File hashes

Hashes for vulkan_ilm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 22afab24393766a23d820d5a88549258dbd3c2f20d15b93946d0c49998d47ea8
MD5 7f1137348cfbbb4f873d17b7b43bebba
BLAKE2b-256 78a68da6345a45f7c5559c3422728c7032b740245ca921ec952731687bc2ce73

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page