LLM Turbo-Optimizer CLI. detect hardware, parse GGUF models, generate optimized llama-server commands.

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

Stavros-alt

These details have not been verified by PyPI

Project description

clanker

generates llama-server commands so you don't have to guess how many layers fit on your gpu. reads the gguf header, does the math, prints the command.

i got tired of trial-and-error vram tuning for moe models on consumer gpus.

Setup

requires python 3.10+ and pip.

pip install clanker-gguf

or install from source:

pip install git+https://github.com/Stavros-alt/clanker.git
pip install "git+https://github.com/Stavros-alt/clanker.git#egg=clanker-gguf[all]"

or clone and install locally:

git clone https://github.com/Stavros-alt/clanker.git
cd clanker
pip install -e ".[all]"

you also need:

nvidia-smi on PATH (cpu-only mode works without it)
llama-server from ik_llama.cpp (build it with clanker build, or install manually)
windows? good luck. this probably works on wsl.

Usage

`clanker run <model>`

clanker run ~/models/qwen.gguf          # direct path
clanker run qwen                         # fuzzy cache search
clanker run qwen.gguf --preset speed     # apply a preset
clanker run qwen.gguf --context 65536    # override context
clanker run qwen.gguf --execute          # run it

defaults to 8K context, q8_0 KV cache, no flash attention. pass --preset big-brain for the optimized 128k config.

`clanker download <repo>`

clanker download unsloth/Qwen3.6-35B-A3B-GGUF            # scan only
clanker download unsloth/Qwen3.6-35B-A3B-GGUF --execute   # download
clanker download unsloth/Qwen3.6-35B-A3B-GGUF -c 32768    # budget for 32k ctx

picks the highest-quality quant that fits your ram, with preference for models whose backbone fits on gpu. --fit handles the offloading at runtime.

quality warnings: if the best available quant is below IQ4_XS (the kneedle knee point from KLD benchmarks), clanker aborts and tells you to use a smaller model or a different repo. accuracy drops off a cliff below IQ4_XS (78.4% top-1 @ 16.4 gb).

unsloth _XL warning: some unsloth _XL quants contain f16 tensors that crash ik_llama.cpp. clanker warns and lets you filter out the _risky quant to pick the next best.

Other commands

clanker discover         # scan hf cache for gguf models
clanker discover --json  # json output
clanker ls               # alias for discover
clanker search           # open hf search with hardware-tuned bounds
clanker presets          # list all presets with their settings
clanker build            # clone and compile ik_llama.cpp

clanker search caps results by vram budget at IQ4_XS quality. if your kv cache eats all the vram (128k ctx on 12gb cards), it warns that models will offload entirely to ram.

Presets

all presets enable mtp speculative decoding when the model supports it.

Name	Context	KV Cache	Notes
`big-brain`	128K	k=q8_0, v=q5_1	mlock, no-mmap, flash-attn
`speed`	32K	k=q6_0, v=q5_0	mlock, flash-attn
`infinite`	512K	k=q5_0, v=q4_1	no-mmap, flash-attn
`coding`	64K	k=q8_0, v=q5_1	mlock, flash-attn, temp=0.2

kv cache quants are from KLD benchmarks (anbeeld). mixed k/v pairs outperform symmetric ones on the pareto frontier.

How it works

reads your gpu vram via nvidia-smi
detects physical cpu cores (uses cores - 1 for thread count)
parses the gguf header for architecture, layers, experts, quant type
figures out layer split, kv cache size, fit-margin based on available vram
spits out a llama-server command with thread tuning, ubatch tuning, and mlock

Notes

--fit --fit-margin N means "keep N MB of vram free". base is 1664 for ik_llama, 4608 for mainline. mtp adds 2048 mb overhead.
thread count is physical_cores - 1 (min 1). keeps one core free for gpu scheduling and os tasks.
-ub 2048 sets the micro-batch size for max memory-bandwidth utilization during prompt prefill.
mtp flags are only added when the model file has "mtp" in its name.
the downloader tries llama-cli -hf first, falls back to huggingface_hub. if llama-cli downloads then ooms, clanker detects the existing file and skips re-download.
quantization detection from gguf metadata is unreliable (returns int enums). filenames are more accurate, so clanker parses the filename.
combined expert tensors (qwen3, deepseek) are handled by dividing total size by expert_count from metadata.
--backend llama switches to mainline llama.cpp flags (spec-type, spec-draft-n-max, etc). default is ik_llama.

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

Stavros-alt

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.2.0

Jun 7, 2026

1.1.0

Jun 6, 2026

1.0.4

Jun 5, 2026

1.0.1

Jun 5, 2026

1.0.0

Jun 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clanker_gguf-1.2.0.tar.gz (32.1 kB view details)

Uploaded Jun 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

clanker_gguf-1.2.0-py3-none-any.whl (28.2 kB view details)

Uploaded Jun 7, 2026 Python 3

File details

Details for the file clanker_gguf-1.2.0.tar.gz.

File metadata

Download URL: clanker_gguf-1.2.0.tar.gz
Upload date: Jun 7, 2026
Size: 32.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for clanker_gguf-1.2.0.tar.gz
Algorithm	Hash digest
SHA256	`7f3d65b9f9d7342c3616265b53f87e80da3b023f0ffc09b772612bcf3439f59d`
MD5	`3b5c27d56216e9c8c0fc129a6b796001`
BLAKE2b-256	`4224d61c24149370963a47d50937ce6c40109534c63fa3b9ec65cd9413265e52`

See more details on using hashes here.

Provenance

The following attestation bundles were made for clanker_gguf-1.2.0.tar.gz:

Publisher: publish.yml on Stavros-alt/clanker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: clanker_gguf-1.2.0.tar.gz
- Subject digest: 7f3d65b9f9d7342c3616265b53f87e80da3b023f0ffc09b772612bcf3439f59d
- Sigstore transparency entry: 1751614823
- Sigstore integration time: Jun 7, 2026
Source repository:
- Permalink: Stavros-alt/clanker@3bfbe7c6b74446c0662b62a84bc28f5b21635880
- Branch / Tag: refs/tags/v1.2.0
- Owner: https://github.com/Stavros-alt
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@3bfbe7c6b74446c0662b62a84bc28f5b21635880
- Trigger Event: push

File details

Details for the file clanker_gguf-1.2.0-py3-none-any.whl.

File metadata

Download URL: clanker_gguf-1.2.0-py3-none-any.whl
Upload date: Jun 7, 2026
Size: 28.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for clanker_gguf-1.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8c4909bc078b22b9dd3f4f2a4f0569105e3328c4cf556e66e245c5a858c38a6f`
MD5	`37fd5f7a7e4efa05ac214d5c7f1aad25`
BLAKE2b-256	`c8893a5118ea16cc76b95ef57d28b6ccace079b5388ac7b3a8f1707d83a7505f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for clanker_gguf-1.2.0-py3-none-any.whl:

Publisher: publish.yml on Stavros-alt/clanker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: clanker_gguf-1.2.0-py3-none-any.whl
- Subject digest: 8c4909bc078b22b9dd3f4f2a4f0569105e3328c4cf556e66e245c5a858c38a6f
- Sigstore transparency entry: 1751614978
- Sigstore integration time: Jun 7, 2026
Source repository:
- Permalink: Stavros-alt/clanker@3bfbe7c6b74446c0662b62a84bc28f5b21635880
- Branch / Tag: refs/tags/v1.2.0
- Owner: https://github.com/Stavros-alt
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@3bfbe7c6b74446c0662b62a84bc28f5b21635880
- Trigger Event: push

clanker-gguf 1.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

clanker

Setup

Usage

`clanker run <model>`

`clanker download <repo>`

Other commands

Presets

How it works

Notes

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance