Skip to main content

Universal model quantization and format conversion CLI

Project description

castkit

PyPI License: MIT Python 3.12+

castkit is a CLI tool for model quantization and format conversion across GGUF, MLX, GPTQ, AWQ, ONNX, EXL2, and EXL3 workflows, including cross-format conversion via automatic FP16 decast.

Requirements

  • Python 3.12+
  • Apple Silicon Mac for MLX backend
  • NVIDIA GPU + CUDA for GPTQ/AWQ convert
  • NVIDIA GPU + CUDA for EXL2/EXL3 convert and measure
  • ONNX Runtime for ONNX backend
  • llama.cpp (llama-quantize, convert_hf_to_gguf.py) for GGUF convert

Installation

Homebrew (macOS)

brew install schroneko/tap/castkit

pip / uv

uv tool install castkit          # core only
uv tool install castkit[mlx]     # MLX backend (Apple Silicon)
uv tool install castkit[gguf]    # GGUF backend (requires torch)
uv tool install castkit[onnx]    # ONNX backend
uv tool install castkit[exl2]    # EXL2 backend (CUDA required)
uv tool install castkit[exl3]    # EXL3 backend (CUDA required)
uv tool install castkit[all]     # all backends

Quick Start

# version
castkit --version

# convert to GGUF
castkit convert Qwen/Qwen3-0.6B -f gguf -q q4_k_m -o ./output/Qwen3-0.6B.gguf

# convert to MLX 4-bit
castkit convert Qwen/Qwen3-0.6B -f mlx -b 4 -o ./output/Qwen3-0.6B-mlx-4bit

# convert to ONNX
castkit convert Qwen/Qwen3-0.6B -f onnx -o ./output/Qwen3-0.6B-onnx

# convert to EXL2 5bpw
castkit convert Qwen/Qwen3-0.6B -f exl2 -q exl2-5.0 -o ./output/Qwen3-0.6B-exl2

# convert to EXL3 4bpw
castkit convert Qwen/Qwen3-0.6B -f exl3 -q exl3-4.0 -o ./output/Qwen3-0.6B-exl3

# decast (dequantize back to FP16 SafeTensors)
castkit decast ./output/Qwen3-0.6B.gguf -o ./output/Qwen3-0.6B-fp16

# cross-format conversion (GGUF -> GPTQ via automatic FP16 decast)
castkit convert ./output/Qwen3-0.6B.gguf -f gptq -b 4 -o ./output/Qwen3-0.6B-gptq

# model info
castkit info ./output/Qwen3-0.6B.gguf

# perplexity measurement
castkit measure ./output/Qwen3-0.6B.gguf --dataset wikitext-2 --max-samples 128

# importance matrix generation (GGUF)
castkit imatrix ./model -d calibration.txt -o ./output/model.imatrix

Recipes

Define reusable conversion presets in castkit.toml:

[recipes.gguf-standard]
format = "gguf"
quant = "q4_k_m"
imatrix = true
imatrix_data = "calibration.txt"

[recipes.mlx-4bit]
format = "mlx"
bits = 4
group_size = 64
castkit convert Qwen/Qwen3-0.6B --recipe gguf-standard
# batch: convert one model to multiple quants
for q in q4_k_m q5_k_m q6_k q8_0; do
  castkit convert Qwen/Qwen3-0.6B -f gguf -q "$q" -o "./output/Qwen3-0.6B-$q.gguf"
done

Upload to Hugging Face

castkit convert Qwen/Qwen3-0.6B -f mlx -b 4 --upload auto
castkit convert Qwen/Qwen3-0.6B -f mlx -b 4 --upload user/repo-name --public

Supported Formats

Format Convert Decast Info Measure
GGUF Yes Yes Yes Yes
MLX Yes Yes Yes Yes
GPTQ Yes (CUDA) Yes Yes Yes
AWQ Yes (CUDA) Yes Yes Yes
ONNX Yes Yes Yes Yes
EXL2 Yes (CUDA) Yes Yes Yes (CUDA)
EXL3 Yes (CUDA) Yes Yes Yes (CUDA)
FP16 Yes Yes Yes Yes
BF16 Yes Yes Yes Yes
FP32 Yes Yes Yes Yes

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

castkit-0.1.1.tar.gz (220.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

castkit-0.1.1-py3-none-any.whl (45.4 kB view details)

Uploaded Python 3

File details

Details for the file castkit-0.1.1.tar.gz.

File metadata

  • Download URL: castkit-0.1.1.tar.gz
  • Upload date:
  • Size: 220.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for castkit-0.1.1.tar.gz
Algorithm Hash digest
SHA256 00d8b40389cecda8b77199a59a1cf4bb9e19ca5fbbfc7188d1b0714887b368c7
MD5 0b40b6e38572771fd2735662914938bf
BLAKE2b-256 aba684721a79d12ad8d1b62d8d27c412a9b9faa4c54d26549e18602b9a113a88

See more details on using hashes here.

File details

Details for the file castkit-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: castkit-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 45.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for castkit-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 77f52625d4c6f9806cbd9b4ef2bfe8b181a58228a99faba074d4e903c2f64090
MD5 f72dcb58797ad41eb5fc509cbd06938d
BLAKE2b-256 be86b9dd9e2982de3e2d0f2496bf6596730c67c3a844ec15ccc6be99fefec436

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page