Skip to main content

Pythonic LLM inference on legacy GPUs using Vulkan โ€” GPU-accelerated local AI for AMD, Intel, and NVIDIA without CUDA.

Project description

VulkanIlm ๐Ÿš€๐Ÿ”ฅ

GPU-Accelerated Local LLMs for Everyone (Vulkan + Ilm โ€” "knowledge")

VulkanIlm is a Python-first wrapper and CLI around llama.cpp's Vulkan backend that brings fast local LLM inference to AMD, Intel, and NVIDIA GPUs โ€” no CUDA required. Built for developers with legacy or non-NVIDIA hardware.


TL;DR

  • What: Python library + CLI to run LLMs locally using Vulkan GPU acceleration.
  • Why: Most acceleration tooling targets CUDA/NVIDIA โ€” VulkanIlm opens up AMD & Intel users.
  • Quick result: Small models can run orders of magnitude faster on iGPUs; mid/large legacy GPUs get ~4โ€“6ร— speedups vs CPU.

Key features

  • ๐Ÿš€ Significant speedups vs CPU on legacy GPUs and iGPUs
  • ๐ŸŽฎ Broad GPU support: AMD, Intel, NVIDIA (via Vulkan)
  • ๐Ÿ Python-first API + easy CLI tools
  • โšก Auto detection + GPU-specific optimizations
  • ๐Ÿ“ฆ Auto build/install of llama.cpp Vulkan backend
  • ๐Ÿ”„ Real-time streaming token generation
  • โœ… Reproducible benchmark scripts in benchmarks/

Benchmarks (summary)

Benchmarks measured with Gemma-3n-E4B-it (6.9B) unless noted. Results depend on model quantization, GPU drivers, OS, and system load.

Hardware (OS) Model CPU time Vulkan (GPU) time Speedup
Dell E7250 (i7-5600U, integrated GPU) โ€” Fedora 42 Workstation TinyLLaMA-1.1B-Chat (Q4_K_M) 121 s 3 s 33ร—
AMD RX 580 8GB โ€” Ubuntu 22.04.5 LTS (Jammy) Gemma-3n-E4B-it (6.9B) 188.47 s 44.74 s 4.21ร—
Intel Arc A770 Gemma-3n-E4B-it (6.9B) ~120 s ~25 s ~4.8ร—
AMD RX 6600 Gemma-3n-E4B-it (6.9B) ~90 s ~18 s ~5.0ร—

iGPU notes

  • The Dell E7250 iGPU result shows older integrated GPUs can be very effective for smaller LLMs when using Vulkan.
  • Smaller models and appropriate quantizations are more iGPU-friendly. Driver/version differences significantly affect results.

Other tested (functional) models

  • DeepSeek-R1-Distill-Qwen-1.5B-unsloth-bnb-4bit โ€” runs (not benchmarked).
  • LLaMA 3.1 8B โ€” runs (not benchmarked).

ROCm / AMD notes

  • ROCm is not officially supported for gfx803 (RX 580).
  • Some community members try ROCm 5/6 workarounds on RX 580, but they are unstable/unsupported.
  • VulkanIlm offers a Vulkan-based path that avoids ROCm on legacy AMD cards.

Install

Quick start

git clone https://github.com/Talnz007/VulkanIlm.git
cd VulkanIlm
pip install -e .

Prerequisites

  • Python 3.9+
  • Vulkan-capable GPU (AMD RX 400+, Intel Arc/Xe, NVIDIA GTX 900+)
  • Vulkan drivers installed and working

Install Vulkan tools (if needed)

Ubuntu / Debian:

sudo apt update
sudo apt install vulkan-tools libvulkan-dev

Fedora / RHEL:

sudo dnf install vulkan-tools vulkan-devel

Verify:

vulkaninfo

Usage

CLI examples

# Auto-install llama.cpp with Vulkan support
vulkanilm install

# Check your GPU setup
vulkanilm vulkan-info

# Search and download models (if supported)
vulkanilm search "llama"
vulkanilm download microsoft/DialoGPT-medium

# Generate text
vulkanilm ask path/to/model.gguf --prompt "Explain quantum computing"

# Stream tokens in real-time
vulkanilm stream path/to/model.gguf "Tell me a story about AI"

# Run a benchmark
vulkanilm benchmark path/to/model.gguf --prompt "Benchmark prompt" --repeat 3

Python API (example)

from vulkan_ilm import Llama

# Load model (auto GPU optimization)
llm = Llama("path/to/model.gguf", gpu_layers=16)

# Synchronous generation
response = llm.ask("Explain the term 'ilm' in AI context.")
print(response)

# Streaming generation
for token in llm.stream_ask_real("Tell me about Vulkan API"):
    print(token, end='', flush=True)

Reproduce benchmarks (quick checklist)

  1. Use the exact model file & quantization referenced in /benchmarks (GGUF + quantization).
  2. Use the benchmark script in benchmarks/run_benchmark.sh.
  3. Record: driver version, OS version, CPU frequency governor, and system load.
  4. Run benchmarks multiple times (cold and warm cache) and average results.

Troubleshooting (Linux)

vulkanilm: command not found

  • Activate venv and reinstall:
python3 -m venv venv
source venv/bin/activate
pip install -e .
  • Or run via Poetry:
poetry run vulkanilm install

Could NOT find Vulkan (missing: glslc)

  • Install glslc (Vulkan SDK / vulkan-tools):
# Fedora
sudo dnf install glslc

# Ubuntu/Debian
sudo apt install vulkan-tools

Verify: glslc --version

Could NOT find CURL

  • Install libcurl dev:
# Fedora
sudo dnf install libcurl-devel

# Ubuntu/Debian
sudo apt install libcurl4-openssl-dev

Project structure

VulkanIlm/
โ”œโ”€โ”€ vulkan_ilm/
โ”‚   โ”œโ”€โ”€ cli.py
โ”‚   โ”œโ”€โ”€ llama.py
โ”‚   โ”œโ”€โ”€ vulkan/
โ”‚   โ”‚   โ””โ”€โ”€ detector.py
โ”‚   โ”œโ”€โ”€ benchmark.py
โ”‚   โ”œโ”€โ”€ installer.py
โ”‚   โ””โ”€โ”€ streaming.py
โ”œโ”€โ”€ benchmarks/             # benchmark scripts & data
โ”œโ”€โ”€ pyproject.toml
โ””โ”€โ”€ README.md

Contributing

We welcome contributions! Useful areas:

  • GPU testing across drivers & OSes
  • Additional model formats & quant recipes
  • Memory & perf optimizations
  • Docs, reproducible benchmarks, and examples

See CONTRIBUTING.md for details. Look for good-first-issue tags.


The story behind the name

Ilm (ุนู„ู…) = knowledge / wisdom. Combined with Vulkan โ€” โ€œknowledge on fireโ€: making fast local AI accessible to everyone, regardless of GPU brand or budget. ๐Ÿ”ฅ


License

MIT โ€” see LICENSE for details.


Links & support


Built with passion by @Talnz007 โ€” bringing fast, local AI to legacy GPUs everywhere.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vulkan_ilm-0.1.1.tar.gz (24.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vulkan_ilm-0.1.1-py3-none-any.whl (27.3 kB view details)

Uploaded Python 3

File details

Details for the file vulkan_ilm-0.1.1.tar.gz.

File metadata

  • Download URL: vulkan_ilm-0.1.1.tar.gz
  • Upload date:
  • Size: 24.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.13.7 Linux/6.16.8-200.fc42.x86_64

File hashes

Hashes for vulkan_ilm-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d781f6e4a8214b9c15b87f428ebc90393b2501a4c6b23ac5cc25327967d4e781
MD5 572fcf89fdc05eb0f0e6c9099a91ebbb
BLAKE2b-256 e53ff5abf2e413fc5a39beadac7b373f5bf556c0b726215ff8c25e7fa29bf208

See more details on using hashes here.

File details

Details for the file vulkan_ilm-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: vulkan_ilm-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 27.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.13.7 Linux/6.16.8-200.fc42.x86_64

File hashes

Hashes for vulkan_ilm-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 010b0e25f160b123f69b88d68bcf3fd3960536947e954f57a0ec556d515582e6
MD5 dc8ed9bdb0c1e9db979fab10948529c3
BLAKE2b-256 342dc2c171f22022aafc275fdab09625262d76a0f7d91e2cd4a1497ef6c465f5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page