Generate a llama-quantize command to copy the quantization parameters of any GGUF
Project description
quant_clone
This is a simple little script to help you generate a llama-quantize (from llama.cpp) command which will allow you to quantize your own GGUF the same way your target GGUF has been quantized.
Installation
pip install quant_clone
if the published gguf package doesn't support your model yet, install the current one with:
pip install --force-reinstall --upgrade "git+https://github.com/ggml-org/llama.cpp.git#egg=gguf&subdirectory=gguf-py"
Usage
quant_clone input.gguf output.txt
input.gguf is the GGUF file whose quantization parameters you would like to copy
output.txt parameter is optional, if it's omitted the output will be saved to cmd.txt
Example
if I take one of unsloth's dynamic 2.0 quants and run:
quant_clone gemma-3-1b-it-UD-IQ1_S.gguf
I get this output:
llama-quantize --imatrix <imatrix_unsloth.dat> --tensor-type token_embd.weight=Q5_1 --tensor-type "blk\.(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25)\.attn_k.weight=IQ4_NL" --tensor-type "blk\.(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25)\.attn_output.weight=IQ2_XXS" --tensor-type "blk\.(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25)\.attn_q.weight=IQ4_NL" --tensor-type "blk\.(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25)\.attn_v.weight=Q5_0" --tensor-type "blk\.(0|2|3|4|25)\.ffn_down.weight=IQ3_S" --tensor-type "blk\.(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25)\.ffn_gate.weight=IQ4_NL" --tensor-type "blk\.(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25)\.ffn_up.weight=IQ4_NL" --tensor-type "blk\.(1)\.ffn_down.weight=Q2_K" --tensor-type "blk\.(5|6|7|8|9|10|16|17|18|19|20|21|22|23|24)\.ffn_down.weight=IQ1_S" --tensor-type "blk\.(11|12|13|14|15)\.ffn_down.weight=IQ2_S" <input.gguf> <output.gguf> Q8_0
That's the command to run to replicate the quantization. Make sure to edit imatrix path, input gguf path, and output gguf path.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file quant_clone-0.1.3.tar.gz.
File metadata
- Download URL: quant_clone-0.1.3.tar.gz
- Upload date:
- Size: 4.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8c618e865267e0378429a497d7f6dad193cdfb0118fb190e04c7a864a5ab7eb6
|
|
| MD5 |
4efe1259dbd075a27101186ca3351f6b
|
|
| BLAKE2b-256 |
9de8983e9b98fbd5ee195dd7a8a1b719b76bfc218e2f3b5256178d0f852fde6f
|
File details
Details for the file quant_clone-0.1.3-py3-none-any.whl.
File metadata
- Download URL: quant_clone-0.1.3-py3-none-any.whl
- Upload date:
- Size: 4.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ab8273e0e699a6e498caf696b21d998e63d8ecd079f716826fb275c7ff8d01e
|
|
| MD5 |
1268b7386d168d2af871ffaf2d3f1e3f
|
|
| BLAKE2b-256 |
4f346e576f19da52007c8e19f5a8e64cf3b0a9296a5f9c242c8bdc99e140c39e
|