Pythonic LLM inference on legacy GPUs using Vulkan

These details have not been verified by PyPI

Project links

Project description

VulkanIlm 🚀🔥

GPU-Accelerated Local LLMs for Everyone (Vulkan + Ilm — "knowledge")

VulkanIlm is a Python-first wrapper and CLI around llama.cpp's Vulkan backend that brings fast local LLM inference to AMD, Intel, and NVIDIA GPUs — no CUDA required. Built for developers with legacy or non-NVIDIA hardware.

TL;DR

What: Python library + CLI to run LLMs locally using Vulkan GPU acceleration.
Why: Most acceleration tooling targets CUDA/NVIDIA — VulkanIlm opens up AMD & Intel users.
Quick result: Small models can run orders of magnitude faster on iGPUs; mid/large legacy GPUs get ~4–6× speedups vs CPU.

Key features

🚀 Significant speedups vs CPU on legacy GPUs and iGPUs
🎮 Broad GPU support: AMD, Intel, NVIDIA (via Vulkan)
🐍 Python-first API + easy CLI tools
⚡ Auto detection + GPU-specific optimizations
📦 Auto build/install of llama.cpp Vulkan backend
🔄 Real-time streaming token generation
✅ Reproducible benchmark scripts in benchmarks/

Benchmarks (summary)

Benchmarks measured with Gemma-3n-E4B-it (6.9B) unless noted. Results depend on model quantization, GPU drivers, OS, and system load.

Hardware (OS)	Model	CPU time	Vulkan (GPU) time	Speedup
Dell E7250 (i7-5600U, integrated GPU) — Fedora 42 Workstation	TinyLLaMA-1.1B-Chat (Q4_K_M)	121 s	3 s	33×
AMD RX 580 8GB — Ubuntu 22.04.5 LTS (Jammy)	Gemma-3n-E4B-it (6.9B)	188.47 s	44.74 s	4.21×
Intel Arc A770	Gemma-3n-E4B-it (6.9B)	~120 s	~25 s	~4.8×
AMD RX 6600	Gemma-3n-E4B-it (6.9B)	~90 s	~18 s	~5.0×

iGPU notes

The Dell E7250 iGPU result shows older integrated GPUs can be very effective for smaller LLMs when using Vulkan.
Smaller models and appropriate quantizations are more iGPU-friendly. Driver/version differences significantly affect results.

Other tested (functional) models

DeepSeek-R1-Distill-Qwen-1.5B-unsloth-bnb-4bit — runs (not benchmarked).
LLaMA 3.1 8B — runs (not benchmarked).

ROCm / AMD notes

ROCm is not officially supported for gfx803 (RX 580).
Some community members try ROCm 5/6 workarounds on RX 580, but they are unstable/unsupported.
VulkanIlm offers a Vulkan-based path that avoids ROCm on legacy AMD cards.

Install

Quick start

git clone https://github.com/Talnz007/VulkanIlm.git
cd VulkanIlm
pip install -e .

Prerequisites

Python 3.9+
Vulkan-capable GPU (AMD RX 400+, Intel Arc/Xe, NVIDIA GTX 900+)
Vulkan drivers installed and working

Install Vulkan tools (if needed)

Ubuntu / Debian:

sudo apt update
sudo apt install vulkan-tools libvulkan-dev

Fedora / RHEL:

sudo dnf install vulkan-tools vulkan-devel

Verify:

vulkaninfo

Usage

CLI examples

# Auto-install llama.cpp with Vulkan support
vulkanilm install

# Check your GPU setup
vulkanilm vulkan-info

# Search and download models (if supported)
vulkanilm search "llama"
vulkanilm download microsoft/DialoGPT-medium

# Generate text
vulkanilm ask path/to/model.gguf --prompt "Explain quantum computing"

# Stream tokens in real-time
vulkanilm stream path/to/model.gguf "Tell me a story about AI"

# Run a benchmark
vulkanilm benchmark path/to/model.gguf --prompt "Benchmark prompt" --repeat 3

Python API (example)

from vulkan_ilm import Llama

# Load model (auto GPU optimization)
llm = Llama("path/to/model.gguf", gpu_layers=16)

# Synchronous generation
response = llm.ask("Explain the term 'ilm' in AI context.")
print(response)

# Streaming generation
for token in llm.stream_ask_real("Tell me about Vulkan API"):
    print(token, end='', flush=True)

Reproduce benchmarks (quick checklist)

Use the exact model file & quantization referenced in /benchmarks (GGUF + quantization).
Use the benchmark script in benchmarks/run_benchmark.sh.
Record: driver version, OS version, CPU frequency governor, and system load.
Run benchmarks multiple times (cold and warm cache) and average results.

Troubleshooting (Linux)

`vulkanilm: command not found`

Activate venv and reinstall:

python3 -m venv venv
source venv/bin/activate
pip install -e .

Or run via Poetry:

poetry run vulkanilm install

`Could NOT find Vulkan (missing: glslc)`

Install glslc (Vulkan SDK / vulkan-tools):

# Fedora
sudo dnf install glslc

# Ubuntu/Debian
sudo apt install vulkan-tools

Verify: glslc --version

`Could NOT find CURL`

Install libcurl dev:

# Fedora
sudo dnf install libcurl-devel

# Ubuntu/Debian
sudo apt install libcurl4-openssl-dev

Project structure

VulkanIlm/
├── vulkan_ilm/
│   ├── cli.py
│   ├── llama.py
│   ├── vulkan/
│   │   └── detector.py
│   ├── benchmark.py
│   ├── installer.py
│   └── streaming.py
├── benchmarks/             # benchmark scripts & data
├── pyproject.toml
└── README.md

Contributing

We welcome contributions! Useful areas:

GPU testing across drivers & OSes
Additional model formats & quant recipes
Memory & perf optimizations
Docs, reproducible benchmarks, and examples

See CONTRIBUTING.md for details. Look for good-first-issue tags.

The story behind the name

Ilm (علم) = knowledge / wisdom. Combined with Vulkan — “knowledge on fire”: making fast local AI accessible to everyone, regardless of GPU brand or budget. 🔥

License

MIT — see LICENSE for details.

Links & support

Repo: https://github.com/Talnz007/VulkanIlm
Issues: Report bugs or request features on GitHub
Discussions: Community Q&A

Built with passion by @Talnz007 — bringing fast, local AI to legacy GPUs everywhere.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

Oct 14, 2025

This version

0.1.0

Oct 13, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vulkan_ilm-0.1.0.tar.gz (23.8 kB view details)

Uploaded Oct 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vulkan_ilm-0.1.0-py3-none-any.whl (27.2 kB view details)

Uploaded Oct 13, 2025 Python 3

File details

Details for the file vulkan_ilm-0.1.0.tar.gz.

File metadata

Download URL: vulkan_ilm-0.1.0.tar.gz
Upload date: Oct 13, 2025
Size: 23.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.5 CPython/3.13.7 Linux/6.16.8-200.fc42.x86_64

File hashes

Hashes for vulkan_ilm-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`73604c02cfe95232deaa63d9ab4c01bc2b5307d6eaa1ed77efff284e0679dd8b`
MD5	`27c748766821c5861ed2b86b718d9062`
BLAKE2b-256	`ed043bb8b0c2a97137c97213e8cb06ed41943ba72c0905fde5e3a9c7019bccfb`

See more details on using hashes here.

File details

Details for the file vulkan_ilm-0.1.0-py3-none-any.whl.

File metadata

Download URL: vulkan_ilm-0.1.0-py3-none-any.whl
Upload date: Oct 13, 2025
Size: 27.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.5 CPython/3.13.7 Linux/6.16.8-200.fc42.x86_64

File hashes

Hashes for vulkan_ilm-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`22afab24393766a23d820d5a88549258dbd3c2f20d15b93946d0c49998d47ea8`
MD5	`7f1137348cfbbb4f873d17b7b43bebba`
BLAKE2b-256	`78a68da6345a45f7c5559c3422728c7032b740245ca921ec952731687bc2ce73`

See more details on using hashes here.

vulkan-ilm 0.1.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

VulkanIlm 🚀🔥

TL;DR

Key features

Benchmarks (summary)

ROCm / AMD notes

Install

Usage

CLI examples

Python API (example)

Reproduce benchmarks (quick checklist)

Troubleshooting (Linux)

vulkanilm: command not found

Could NOT find Vulkan (missing: glslc)

Could NOT find CURL

Project structure

Contributing

The story behind the name

License

Links & support

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`vulkanilm: command not found`

`Could NOT find Vulkan (missing: glslc)`

`Could NOT find CURL`