Skip to main content

Generate a llama-quantize command to copy the quantization parameters of any GGUF

Project description

quant_clone

This is a simple little script to help you generate a llama-quantize (from llama.cpp) command which will allow you to quantize your own GGUF the same way your target GGUF has been quantized.

Installation

pip install quant_clone

if the published gguf package doesn't support your model yet, install the current one with:

pip install --force-reinstall --upgrade "git+https://github.com/ggml-org/llama.cpp.git#egg=gguf&subdirectory=gguf-py"

Usage

quant_clone input.gguf output.txt

input.gguf is the GGUF file whose quantization parameters you would like to copy

output.txt parameter is optional, if it's omitted the output will be saved to cmd.txt

Example

if I take one of unsloth's dynamic 2.0 quants and run:

quant_clone gemma-3-1b-it-UD-IQ1_S.gguf

I get this output:

llama-quantize --imatrix <imatrix_unsloth.dat> --tensor-type token_embd.weight=Q5_1 --tensor-type "blk\.(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25)\.attn_k.weight=IQ4_NL" --tensor-type "blk\.(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25)\.attn_output.weight=IQ2_XXS" --tensor-type "blk\.(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25)\.attn_q.weight=IQ4_NL" --tensor-type "blk\.(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25)\.attn_v.weight=Q5_0" --tensor-type "blk\.(0|2|3|4|25)\.ffn_down.weight=IQ3_S" --tensor-type "blk\.(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25)\.ffn_gate.weight=IQ4_NL" --tensor-type "blk\.(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25)\.ffn_up.weight=IQ4_NL" --tensor-type "blk\.(1)\.ffn_down.weight=Q2_K" --tensor-type "blk\.(5|6|7|8|9|10|16|17|18|19|20|21|22|23|24)\.ffn_down.weight=IQ1_S" --tensor-type "blk\.(11|12|13|14|15)\.ffn_down.weight=IQ2_S" <input.gguf> <output.gguf> Q8_0

That's the command to run to replicate the quantization. Make sure to edit imatrix path, input gguf path, and output gguf path.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quant_clone-0.1.3.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quant_clone-0.1.3-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file quant_clone-0.1.3.tar.gz.

File metadata

  • Download URL: quant_clone-0.1.3.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.8

File hashes

Hashes for quant_clone-0.1.3.tar.gz
Algorithm Hash digest
SHA256 8c618e865267e0378429a497d7f6dad193cdfb0118fb190e04c7a864a5ab7eb6
MD5 4efe1259dbd075a27101186ca3351f6b
BLAKE2b-256 9de8983e9b98fbd5ee195dd7a8a1b719b76bfc218e2f3b5256178d0f852fde6f

See more details on using hashes here.

File details

Details for the file quant_clone-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: quant_clone-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 4.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.8

File hashes

Hashes for quant_clone-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 7ab8273e0e699a6e498caf696b21d998e63d8ecd079f716826fb275c7ff8d01e
MD5 1268b7386d168d2af871ffaf2d3f1e3f
BLAKE2b-256 4f346e576f19da52007c8e19f5a8e64cf3b0a9296a5f9c242c8bdc99e140c39e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page