LLM Turbo-Optimizer CLI. detect hardware, parse GGUF models, generate optimized llama-server commands.
Project description
clanker
generates llama-server commands so you don't have to guess how many layers fit on your gpu. reads the gguf header, does the math, prints the command.
i got tired of trial-and-error vram tuning for moe models on consumer gpus.
Setup
requires python 3.10+ and pip.
pip install clanker-gguf
or install from source:
pip install git+https://github.com/Stavros-alt/clanker.git
pip install "git+https://github.com/Stavros-alt/clanker.git#egg=clanker-gguf[all]"
or clone and install locally:
git clone https://github.com/Stavros-alt/clanker.git
cd clanker
pip install -e ".[all]"
you also need:
nvidia-smion PATH (cpu-only mode works without it)llama-serverfrom ik_llama.cpp (build it withclanker build, or install manually)- windows? good luck. this probably works on wsl.
Usage
clanker run <model>
clanker run ~/models/qwen.gguf # direct path
clanker run qwen # fuzzy cache search
clanker run qwen.gguf --preset speed # apply a preset
clanker run qwen.gguf --context 65536 # override context
clanker run qwen.gguf --execute # run it
defaults to 8K context, q8_0 KV cache, no flash attention.
pass --preset big-brain for the optimized 128k config.
clanker download <repo>
clanker download unsloth/Qwen3.6-35B-A3B-GGUF # scan only
clanker download unsloth/Qwen3.6-35B-A3B-GGUF --execute # download
clanker download unsloth/Qwen3.6-35B-A3B-GGUF -c 32768 # budget for 32k ctx
picks the highest-quality quant that fits your ram, with preference for
models whose backbone fits on gpu. --fit handles the offloading at
runtime.
Other commands
clanker discover # scan hf cache for gguf models
clanker discover --json # json output
clanker ls # alias for discover
clanker search # open hf search with hardware-tuned bounds
clanker presets # list all presets with their settings
clanker build # clone and compile ik_llama.cpp
Presets
all presets enable mtp speculative decoding when the model supports it.
| Name | Context | KV Cache | Notes |
|---|---|---|---|
big-brain |
128K | k=q8_0, v=q5_1 | mlock, no-mmap, flash-attn |
speed |
32K | k=q6_0, v=q5_0 | mlock, flash-attn |
infinite |
512K | k=q5_0, v=q4_1 | no-mmap, flash-attn |
coding |
64K | k=q8_0, v=q5_1 | mlock, flash-attn, temp=0.2 |
kv cache quants are from KLD benchmarks (anbeeld). mixed k/v pairs outperform symmetric ones on the pareto frontier.
How it works
- reads your gpu vram via nvidia-smi
- detects physical cpu cores (uses cores - 1 for thread count)
- parses the gguf header for architecture, layers, experts, quant type
- figures out layer split, kv cache size, fit-margin based on available vram
- spits out a llama-server command with thread tuning, ubatch tuning, and mlock
Notes
--fit --fit-margin Nmeans "keep N MB of vram free". base is 1664 for ik_llama, 4608 for mainline. mtp adds 2048 mb overhead.- thread count is
physical_cores - 1(min 1). keeps one core free for gpu scheduling and os tasks. -ub 2048sets the micro-batch size for max memory-bandwidth utilization during prompt prefill.- mtp flags are only added when the model file has "mtp" in its name.
- the downloader tries
llama-cli -hffirst, falls back tohuggingface_hub. if llama-cli downloads then ooms, clanker detects the existing file and skips re-download. - quantization detection from gguf metadata is unreliable (returns int enums). filenames are more accurate, so clanker parses the filename.
- combined expert tensors (qwen3, deepseek) are handled by dividing total size by expert_count from metadata.
--backend llamaswitches to mainline llama.cpp flags (spec-type, spec-draft-n-max, etc). default is ik_llama.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file clanker_gguf-1.0.4.tar.gz.
File metadata
- Download URL: clanker_gguf-1.0.4.tar.gz
- Upload date:
- Size: 27.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fbc5336fe5b687905ad2b1018917620dec579f182c4d5fd809f75d4d5a2c6752
|
|
| MD5 |
db6c0d4d7f02d41cc3516f2ce9917b1f
|
|
| BLAKE2b-256 |
1ac836fce056e3f645e94de9b6c547e91d3166f01fcb10ade9d506fd998761b7
|
Provenance
The following attestation bundles were made for clanker_gguf-1.0.4.tar.gz:
Publisher:
publish.yml on Stavros-alt/clanker
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
clanker_gguf-1.0.4.tar.gz -
Subject digest:
fbc5336fe5b687905ad2b1018917620dec579f182c4d5fd809f75d4d5a2c6752 - Sigstore transparency entry: 1728878506
- Sigstore integration time:
-
Permalink:
Stavros-alt/clanker@fac3014866fcf31bde1e7ffc18ed35989ed3522a -
Branch / Tag:
refs/tags/v1.0.4 - Owner: https://github.com/Stavros-alt
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fac3014866fcf31bde1e7ffc18ed35989ed3522a -
Trigger Event:
push
-
Statement type:
File details
Details for the file clanker_gguf-1.0.4-py3-none-any.whl.
File metadata
- Download URL: clanker_gguf-1.0.4-py3-none-any.whl
- Upload date:
- Size: 25.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e3f52d7f2cf6eab72ef3ab3dd641109b4349a7d4a8188f1a1ee6db672dc16807
|
|
| MD5 |
5f20138460892664e8133a7543ef0dba
|
|
| BLAKE2b-256 |
6bc99c481a647268731b91d8b8bda595a3e9777592b4daea8671b81161ec5d75
|
Provenance
The following attestation bundles were made for clanker_gguf-1.0.4-py3-none-any.whl:
Publisher:
publish.yml on Stavros-alt/clanker
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
clanker_gguf-1.0.4-py3-none-any.whl -
Subject digest:
e3f52d7f2cf6eab72ef3ab3dd641109b4349a7d4a8188f1a1ee6db672dc16807 - Sigstore transparency entry: 1728878672
- Sigstore integration time:
-
Permalink:
Stavros-alt/clanker@fac3014866fcf31bde1e7ffc18ed35989ed3522a -
Branch / Tag:
refs/tags/v1.0.4 - Owner: https://github.com/Stavros-alt
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fac3014866fcf31bde1e7ffc18ed35989ed3522a -
Trigger Event:
push
-
Statement type: