Universal model quantization and format conversion CLI
Project description
castkit
castkit is a CLI tool for model quantization and format conversion across GGUF, MLX, GPTQ, AWQ, ONNX, EXL2, and EXL3 workflows, including cross-format conversion via automatic FP16 decast.
Requirements
- Python 3.12+
- Apple Silicon Mac for MLX backend
- NVIDIA GPU + CUDA for GPTQ/AWQ convert
- NVIDIA GPU + CUDA for EXL2/EXL3 convert and measure
- ONNX Runtime for ONNX backend
- llama.cpp (
llama-quantize,convert_hf_to_gguf.py) for GGUF convert
Installation
Homebrew (macOS)
brew install schroneko/tap/castkit
pip / uv
uv tool install castkit # core only
uv tool install castkit[mlx] # MLX backend (Apple Silicon)
uv tool install castkit[gguf] # GGUF backend (requires torch)
uv tool install castkit[onnx] # ONNX backend
uv tool install castkit[exl2] # EXL2 backend (CUDA required)
uv tool install castkit[exl3] # EXL3 backend (CUDA required)
uv tool install castkit[all] # all backends
Quick Start
# version
castkit --version
# convert to GGUF
castkit convert Qwen/Qwen3-0.6B -f gguf -q q4_k_m -o ./output/Qwen3-0.6B.gguf
# convert to MLX 4-bit
castkit convert Qwen/Qwen3-0.6B -f mlx -b 4 -o ./output/Qwen3-0.6B-mlx-4bit
# convert to ONNX
castkit convert Qwen/Qwen3-0.6B -f onnx -o ./output/Qwen3-0.6B-onnx
# convert to EXL2 5bpw
castkit convert Qwen/Qwen3-0.6B -f exl2 -q exl2-5.0 -o ./output/Qwen3-0.6B-exl2
# convert to EXL3 4bpw
castkit convert Qwen/Qwen3-0.6B -f exl3 -q exl3-4.0 -o ./output/Qwen3-0.6B-exl3
# decast (dequantize back to FP16 SafeTensors)
castkit decast ./output/Qwen3-0.6B.gguf -o ./output/Qwen3-0.6B-fp16
# cross-format conversion (GGUF -> GPTQ via automatic FP16 decast)
castkit convert ./output/Qwen3-0.6B.gguf -f gptq -b 4 -o ./output/Qwen3-0.6B-gptq
# model info
castkit info ./output/Qwen3-0.6B.gguf
# perplexity measurement
castkit measure ./output/Qwen3-0.6B.gguf --dataset wikitext-2 --max-samples 128
# importance matrix generation (GGUF)
castkit imatrix ./model -d calibration.txt -o ./output/model.imatrix
Recipes
Define reusable conversion presets in castkit.toml:
[recipes.gguf-standard]
format = "gguf"
quant = "q4_k_m"
imatrix = true
imatrix_data = "calibration.txt"
[recipes.mlx-4bit]
format = "mlx"
bits = 4
group_size = 64
castkit convert Qwen/Qwen3-0.6B --recipe gguf-standard
# batch: convert one model to multiple quants
for q in q4_k_m q5_k_m q6_k q8_0; do
castkit convert Qwen/Qwen3-0.6B -f gguf -q "$q" -o "./output/Qwen3-0.6B-$q.gguf"
done
Upload to Hugging Face
castkit convert Qwen/Qwen3-0.6B -f mlx -b 4 --upload auto
castkit convert Qwen/Qwen3-0.6B -f mlx -b 4 --upload user/repo-name --public
Supported Formats
| Format | Convert | Decast | Info | Measure |
|---|---|---|---|---|
| GGUF | Yes | Yes | Yes | Yes |
| MLX | Yes | Yes | Yes | Yes |
| GPTQ | Yes (CUDA) | Yes | Yes | Yes |
| AWQ | Yes (CUDA) | Yes | Yes | Yes |
| ONNX | Yes | Yes | Yes | Yes |
| EXL2 | Yes (CUDA) | Yes | Yes | Yes (CUDA) |
| EXL3 | Yes (CUDA) | Yes | Yes | Yes (CUDA) |
| FP16 | Yes | Yes | Yes | Yes |
| BF16 | Yes | Yes | Yes | Yes |
| FP32 | Yes | Yes | Yes | Yes |
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
castkit-0.1.1.tar.gz
(220.2 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
castkit-0.1.1-py3-none-any.whl
(45.4 kB
view details)
File details
Details for the file castkit-0.1.1.tar.gz.
File metadata
- Download URL: castkit-0.1.1.tar.gz
- Upload date:
- Size: 220.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
00d8b40389cecda8b77199a59a1cf4bb9e19ca5fbbfc7188d1b0714887b368c7
|
|
| MD5 |
0b40b6e38572771fd2735662914938bf
|
|
| BLAKE2b-256 |
aba684721a79d12ad8d1b62d8d27c412a9b9faa4c54d26549e18602b9a113a88
|
File details
Details for the file castkit-0.1.1-py3-none-any.whl.
File metadata
- Download URL: castkit-0.1.1-py3-none-any.whl
- Upload date:
- Size: 45.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
77f52625d4c6f9806cbd9b4ef2bfe8b181a58228a99faba074d4e903c2f64090
|
|
| MD5 |
f72dcb58797ad41eb5fc509cbd06938d
|
|
| BLAKE2b-256 |
be86b9dd9e2982de3e2d0f2496bf6596730c67c3a844ec15ccc6be99fefec436
|