Skip to main content

Generate a llama-quantize command to copy the quantization parameters of any GGUF

Project description

quant_clone

This is a simple little script to help you generate a llama-quantize (from llama.cpp) command which will allow you to quantize your own GGUF the same way your target GGUF has been quantized.

Installation

pip install quant_clone

if the published gguf package doesn't support your model yet, install the current one with:

pip install --force-reinstall --upgrade "git+https://github.com/ggml-org/llama.cpp.git#egg=gguf&subdirectory=gguf-py"

Usage

quant_clone input.gguf output.txt

input.gguf is the GGUF file whose quantization parameters you would like to copy

output.txt parameter is optional, if it's omitted the output will be saved to cmd.txt

Example

if I take one of unsloth's dynamic 2.0 quants and run:

quant_clone gemma-3-1b-it-UD-IQ1_S.gguf

I get this output:

llama-quantize --imatrix <imatrix_unsloth.dat> --tensor-type token_embd.weight=Q5_1 --tensor-type "blk\.(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25)\.attn_k.weight=IQ4_NL" --tensor-type "blk\.(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25)\.attn_output.weight=IQ2_XXS" --tensor-type "blk\.(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25)\.attn_q.weight=IQ4_NL" --tensor-type "blk\.(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25)\.attn_v.weight=Q5_0" --tensor-type "blk\.(0|2|3|4|25)\.ffn_down.weight=IQ3_S" --tensor-type "blk\.(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25)\.ffn_gate.weight=IQ4_NL" --tensor-type "blk\.(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25)\.ffn_up.weight=IQ4_NL" --tensor-type "blk\.(1)\.ffn_down.weight=Q2_K" --tensor-type "blk\.(5|6|7|8|9|10|16|17|18|19|20|21|22|23|24)\.ffn_down.weight=IQ1_S" --tensor-type "blk\.(11|12|13|14|15)\.ffn_down.weight=IQ2_S" <input.gguf> <output.gguf> Q8_0

That's the command to run to replicate the quantization. Make sure to edit imatrix path, input gguf path, and output gguf path.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quant_clone-0.1.2.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quant_clone-0.1.2-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file quant_clone-0.1.2.tar.gz.

File metadata

  • Download URL: quant_clone-0.1.2.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for quant_clone-0.1.2.tar.gz
Algorithm Hash digest
SHA256 1490f1395737271fc4d05c861cb6f3bc165c1d570aac1ee239ad5e3be52aa877
MD5 7dd36c21b733a997550faddb5e84d677
BLAKE2b-256 b3ce1260f6e010f605c2b3a40c57d04908b1c37d44dd25e497aa65caedcbf313

See more details on using hashes here.

File details

Details for the file quant_clone-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: quant_clone-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 4.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for quant_clone-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a4bcfc195d85c4c47eaf45a99a6cb55450723b48b751c87aff58d52b7bd907b6
MD5 12c2b0a400a33ff250f463cafba256cf
BLAKE2b-256 5c105d58ff83a7130a1ce1e68982772dce6f674bc5224435225ce190b83b7147

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page