Skip to main content

LLM Turbo-Optimizer CLI. detect hardware, parse GGUF models, generate optimized llama-server commands.

Project description

clanker

generates llama-server commands so you don't have to guess how many layers fit on your gpu. reads the gguf header, does the math, prints the command.

i got tired of trial-and-error vram tuning for moe models on consumer gpus.

Setup

requires python 3.10+ and pip.

pip install clanker-gguf

add extras as needed:

pip install "clanker-gguf[gguf]"     # gguf header parsing
pip install "clanker-gguf[hf]"       # huggingface download support
pip install "clanker-gguf[nvidia]"   # nvidia-ml-py fallback
pip install "clanker-gguf[all]"      # everything above

or install from source:

pip install git+https://github.com/Stavros-alt/clanker.git
pip install "git+https://github.com/Stavros-alt/clanker.git#egg=clanker-gguf[all]"

or clone and install locally:

git clone https://github.com/Stavros-alt/clanker.git
cd clanker
pip install -e ".[all]"

you also need:

  • nvidia-smi on PATH (cpu-only mode works without it)
  • llama-server from ik_llama.cpp (build it with clanker build, or install manually)
  • windows? good luck. this probably works on wsl.

Usage

clanker run <model>

clanker run ~/models/qwen.gguf          # direct path
clanker run qwen                         # fuzzy cache search
clanker run qwen.gguf --preset speed     # apply a preset
clanker run qwen.gguf --context 65536    # override context
clanker run qwen.gguf --execute          # run it

defaults to 8K context, q8_0 KV cache, no flash attention. pass --preset big-brain for the optimized 128k config.

clanker download <repo>

clanker download unsloth/Qwen3.6-35B-A3B-GGUF            # scan only
clanker download unsloth/Qwen3.6-35B-A3B-GGUF --execute   # download
clanker download unsloth/Qwen3.6-35B-A3B-GGUF -c 32768    # budget for 32k ctx

picks the highest-quality quant that fits your ram, with preference for models whose backbone fits on gpu. --fit handles the offloading at runtime.

Other commands

clanker discover         # scan hf cache for gguf models
clanker discover --json  # json output
clanker ls               # alias for discover
clanker search           # open hf search with hardware-tuned bounds
clanker presets          # list all presets with their settings
clanker build            # clone and compile ik_llama.cpp

Presets

all presets enable mtp speculative decoding when the model supports it.

Name Context KV Cache Notes
big-brain 128K k=q8_0, v=q5_1 mlock, no-mmap, flash-attn
speed 32K k=q6_0, v=q5_0 mlock, flash-attn
infinite 512K k=q5_0, v=q4_1 no-mmap, flash-attn
coding 64K k=q8_0, v=q5_1 mlock, flash-attn, temp=0.2

kv cache quants are from KLD benchmarks (anbeeld). mixed k/v pairs outperform symmetric ones on the pareto frontier.

How it works

  1. reads your gpu vram via nvidia-smi
  2. detects physical cpu cores (uses cores - 1 for thread count)
  3. parses the gguf header for architecture, layers, experts, quant type
  4. figures out layer split, kv cache size, fit-margin based on available vram
  5. spits out a llama-server command with thread tuning, ubatch tuning, and mlock

Notes

  • --fit --fit-margin N means "keep N MB of vram free". base is 1664 for ik_llama, 4608 for mainline. mtp adds 2048 mb overhead.
  • thread count is physical_cores - 1 (min 1). keeps one core free for gpu scheduling and os tasks.
  • -ub 2048 sets the micro-batch size for max memory-bandwidth utilization during prompt prefill.
  • mtp flags are only added when the model file has "mtp" in its name.
  • the downloader tries llama-cli -hf first, falls back to huggingface_hub. if llama-cli downloads then ooms, clanker detects the existing file and skips re-download.
  • quantization detection from gguf metadata is unreliable (returns int enums). filenames are more accurate, so clanker parses the filename.
  • combined expert tensors (qwen3, deepseek) are handled by dividing total size by expert_count from metadata.
  • --backend llama switches to mainline llama.cpp flags (spec-type, spec-draft-n-max, etc). default is ik_llama.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clanker_gguf-1.0.1.tar.gz (27.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clanker_gguf-1.0.1-py3-none-any.whl (25.3 kB view details)

Uploaded Python 3

File details

Details for the file clanker_gguf-1.0.1.tar.gz.

File metadata

  • Download URL: clanker_gguf-1.0.1.tar.gz
  • Upload date:
  • Size: 27.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for clanker_gguf-1.0.1.tar.gz
Algorithm Hash digest
SHA256 1480032a45b5ee0d0bb9a72e3f80510651c359d5eef74d2123de0bfc7872b334
MD5 104e4af3eebf529ce3d2aa84599be291
BLAKE2b-256 fe19f71414f843a35d9a9b5a5ee09c80d94c7b1dbc73ddbaa4a98ec348996134

See more details on using hashes here.

Provenance

The following attestation bundles were made for clanker_gguf-1.0.1.tar.gz:

Publisher: publish.yml on Stavros-alt/clanker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file clanker_gguf-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: clanker_gguf-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 25.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for clanker_gguf-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f3efdb3cc1306f7074675e8d3c7a78dddda3e327603c014ff52adbabac918677
MD5 f9e8aa09d62ffbc6caebc673e7c4699e
BLAKE2b-256 7949c8fa3663fcac5608f45c75b010100547950eb71421a3ad04e3669d334d91

See more details on using hashes here.

Provenance

The following attestation bundles were made for clanker_gguf-1.0.1-py3-none-any.whl:

Publisher: publish.yml on Stavros-alt/clanker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page