Pythonic LLM inference on legacy GPUs using Vulkan โ GPU-accelerated local AI for AMD, Intel, and NVIDIA without CUDA.
Project description
VulkanIlm ๐๐ฅ
GPU-Accelerated Local LLMs for Everyone (Vulkan + Ilm โ "knowledge")
VulkanIlm is a Python-first wrapper and CLI around llama.cpp's Vulkan backend that brings fast local LLM inference to AMD, Intel, and NVIDIA GPUs โ no CUDA required. Built for developers with legacy or non-NVIDIA hardware.
TL;DR
- What: Python library + CLI to run LLMs locally using Vulkan GPU acceleration.
- Why: Most acceleration tooling targets CUDA/NVIDIA โ VulkanIlm opens up AMD & Intel users.
- Quick result: Small models can run orders of magnitude faster on iGPUs; mid/large legacy GPUs get ~4โ6ร speedups vs CPU.
Key features
- ๐ Significant speedups vs CPU on legacy GPUs and iGPUs
- ๐ฎ Broad GPU support: AMD, Intel, NVIDIA (via Vulkan)
- ๐ Python-first API + easy CLI tools
- โก Auto detection + GPU-specific optimizations
- ๐ฆ Auto build/install of
llama.cppVulkan backend - ๐ Real-time streaming token generation
- โ
Reproducible benchmark scripts in
benchmarks/
Benchmarks (summary)
Benchmarks measured with Gemma-3n-E4B-it (6.9B) unless noted. Results depend on model quantization, GPU drivers, OS, and system load.
| Hardware (OS) | Model | CPU time | Vulkan (GPU) time | Speedup |
|---|---|---|---|---|
| Dell E7250 (i7-5600U, integrated GPU) โ Fedora 42 Workstation | TinyLLaMA-1.1B-Chat (Q4_K_M) | 121 s | 3 s | 33ร |
| AMD RX 580 8GB โ Ubuntu 22.04.5 LTS (Jammy) | Gemma-3n-E4B-it (6.9B) | 188.47 s | 44.74 s | 4.21ร |
| Intel Arc A770 | Gemma-3n-E4B-it (6.9B) | ~120 s | ~25 s | ~4.8ร |
| AMD RX 6600 | Gemma-3n-E4B-it (6.9B) | ~90 s | ~18 s | ~5.0ร |
iGPU notes
- The Dell E7250 iGPU result shows older integrated GPUs can be very effective for smaller LLMs when using Vulkan.
- Smaller models and appropriate quantizations are more iGPU-friendly. Driver/version differences significantly affect results.
Other tested (functional) models
DeepSeek-R1-Distill-Qwen-1.5B-unsloth-bnb-4bitโ runs (not benchmarked).LLaMA 3.1 8Bโ runs (not benchmarked).
ROCm / AMD notes
- ROCm is not officially supported for
gfx803(RX 580). - Some community members try ROCm 5/6 workarounds on RX 580, but they are unstable/unsupported.
- VulkanIlm offers a Vulkan-based path that avoids ROCm on legacy AMD cards.
Install
Quick start
git clone https://github.com/Talnz007/VulkanIlm.git
cd VulkanIlm
pip install -e .
Prerequisites
- Python 3.9+
- Vulkan-capable GPU (AMD RX 400+, Intel Arc/Xe, NVIDIA GTX 900+)
- Vulkan drivers installed and working
Install Vulkan tools (if needed)
Ubuntu / Debian:
sudo apt update
sudo apt install vulkan-tools libvulkan-dev
Fedora / RHEL:
sudo dnf install vulkan-tools vulkan-devel
Verify:
vulkaninfo
Usage
CLI examples
# Auto-install llama.cpp with Vulkan support
vulkanilm install
# Check your GPU setup
vulkanilm vulkan-info
# Search and download models (if supported)
vulkanilm search "llama"
vulkanilm download microsoft/DialoGPT-medium
# Generate text
vulkanilm ask path/to/model.gguf --prompt "Explain quantum computing"
# Stream tokens in real-time
vulkanilm stream path/to/model.gguf "Tell me a story about AI"
# Run a benchmark
vulkanilm benchmark path/to/model.gguf --prompt "Benchmark prompt" --repeat 3
Python API (example)
from vulkan_ilm import Llama
# Load model (auto GPU optimization)
llm = Llama("path/to/model.gguf", gpu_layers=16)
# Synchronous generation
response = llm.ask("Explain the term 'ilm' in AI context.")
print(response)
# Streaming generation
for token in llm.stream_ask_real("Tell me about Vulkan API"):
print(token, end='', flush=True)
Reproduce benchmarks (quick checklist)
- Use the exact model file & quantization referenced in
/benchmarks(GGUF + quantization). - Use the benchmark script in
benchmarks/run_benchmark.sh. - Record: driver version, OS version, CPU frequency governor, and system load.
- Run benchmarks multiple times (cold and warm cache) and average results.
Troubleshooting (Linux)
vulkanilm: command not found
- Activate venv and reinstall:
python3 -m venv venv
source venv/bin/activate
pip install -e .
- Or run via Poetry:
poetry run vulkanilm install
Could NOT find Vulkan (missing: glslc)
- Install
glslc(Vulkan SDK / vulkan-tools):
# Fedora
sudo dnf install glslc
# Ubuntu/Debian
sudo apt install vulkan-tools
Verify: glslc --version
Could NOT find CURL
- Install libcurl dev:
# Fedora
sudo dnf install libcurl-devel
# Ubuntu/Debian
sudo apt install libcurl4-openssl-dev
Project structure
VulkanIlm/
โโโ vulkan_ilm/
โ โโโ cli.py
โ โโโ llama.py
โ โโโ vulkan/
โ โ โโโ detector.py
โ โโโ benchmark.py
โ โโโ installer.py
โ โโโ streaming.py
โโโ benchmarks/ # benchmark scripts & data
โโโ pyproject.toml
โโโ README.md
Contributing
We welcome contributions! Useful areas:
- GPU testing across drivers & OSes
- Additional model formats & quant recipes
- Memory & perf optimizations
- Docs, reproducible benchmarks, and examples
See CONTRIBUTING.md for details. Look for good-first-issue tags.
The story behind the name
Ilm (ุนูู ) = knowledge / wisdom. Combined with Vulkan โ โknowledge on fireโ: making fast local AI accessible to everyone, regardless of GPU brand or budget. ๐ฅ
License
MIT โ see LICENSE for details.
Links & support
- Repo: https://github.com/Talnz007/VulkanIlm
- Issues: Report bugs or request features on GitHub
- Discussions: Community Q&A
Built with passion by @Talnz007 โ bringing fast, local AI to legacy GPUs everywhere.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vulkan_ilm-0.1.1.tar.gz.
File metadata
- Download URL: vulkan_ilm-0.1.1.tar.gz
- Upload date:
- Size: 24.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.5 CPython/3.13.7 Linux/6.16.8-200.fc42.x86_64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d781f6e4a8214b9c15b87f428ebc90393b2501a4c6b23ac5cc25327967d4e781
|
|
| MD5 |
572fcf89fdc05eb0f0e6c9099a91ebbb
|
|
| BLAKE2b-256 |
e53ff5abf2e413fc5a39beadac7b373f5bf556c0b726215ff8c25e7fa29bf208
|
File details
Details for the file vulkan_ilm-0.1.1-py3-none-any.whl.
File metadata
- Download URL: vulkan_ilm-0.1.1-py3-none-any.whl
- Upload date:
- Size: 27.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.5 CPython/3.13.7 Linux/6.16.8-200.fc42.x86_64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
010b0e25f160b123f69b88d68bcf3fd3960536947e954f57a0ec556d515582e6
|
|
| MD5 |
dc8ed9bdb0c1e9db979fab10948529c3
|
|
| BLAKE2b-256 |
342dc2c171f22022aafc275fdab09625262d76a0f7d91e2cd4a1497ef6c465f5
|