Skip to main content

SD.Next Quantization Engine

Project description

SDNQ: SD.Next Quantization Engine

For more info, please check out SD.Next SDNQ wiki page: https://github.com/vladmandic/sdnext/wiki/SDNQ-Quantization

Install command:

pip install git+https://github.com/Disty0/sdnq

Pre-quantized models can be found here: https://huggingface.co/collections/Disty0/sdnq
Example code to load pre-quantized models:

from sdnq import SDNQConfig # import sdnq to register it into diffusers and transformers
model = AutoModel.from_pretrained(model_path)

Example quantization config code for Diffusers and Transformers libraries:

from sdnq import SDNQConfig
from sdnq.common import use_torch_compile as triton_is_available

sdnq_config = SDNQConfig(
    weights_dtype="int8",
    group_size=0,
    svd_rank=32,
    svd_steps=8,
    use_svd=False,
    quant_conv=False,
    use_quantized_matmul=triton_is_available,
    use_quantized_matmul_conv=False,
    dequantize_fp32=False,
    non_blocking=False,
    add_skip_keys=True,
    quantization_device="cuda",
    return_device="cuda",
    modules_to_not_convert=["correction_coefs", "prediction_coefs", "lm_head", "embedding_projection"],
    modules_dtype_dict={"int8": ["lm_head"]},
)

model = AutoModel.from_pretrained(model_path, quantization_config=sdnq_config)

Example code for saving a quantized model:

from sdnq.loader import save_sdnq_model
# set is_pipeline to True if you want to save the entire diffusers pipeline instead of a single model.
save_sdnq_model(pipe, "path_to_save_the_quantized_model", is_pipeline=False)

Example code for enabling or disabling quantized matmul with a pre-quantized model:

from sdnq.loader import apply_sdnq_options_to_model
quantized_model = apply_sdnq_options_to_model(quantized_model, use_quantized_matmul=True)

Example code for quantized training:
Note:

  • Safetensors serialization is not supported with SDNQ training.
    Either don't use Safetensors serialization or convert the quantized model to standard SDNQ model before saving.
from sdnq.training import sdnq_post_load_quant
from sdnq.common import use_torch_compile as triton_is_available

model = sdnq_post_load_quant(
    model,
    weights_dtype="uint8",
    quantized_matmul_dtype="int8",
    group_size=32, # 0 means auto, -1 means disabled
    svd_rank=32,
    svd_steps=2,
    use_svd=False,
    use_grad_ckpt=True, # disable this if you are not using gradient checkpointing
    use_quantized_matmul=triton_is_available,
    use_static_quantization=True, # quantize the model weights
    use_stochastic_rounding=True,
    dequantize_fp32=True,
    non_blocking=False,
    add_skip_keys=True,
    quantization_device="cuda",
    return_device="cuda",
    modules_to_not_convert=["correction_coefs", "prediction_coefs", "lm_head", "embedding_projection"],
    modules_dtype_dict={"int8": ["lm_head"]},
)

Example code for converting standard SDNQ model to training SDNQ Model:

from sdnq.training import convert_sdnq_model_to_training
from sdnq.common import use_torch_compile as triton_is_available
quantized_model = convert_sdnq_model_to_training(
    quantized_model,
    quantized_matmul_dtype="int8",
    use_grad_ckpt=True,
    use_quantized_matmul=triton_is_available,
    use_stochastic_rounding=True,
    dequantize_fp32=True,
)

Example code for converting training SDNQ model to standard SDNQ Model:

from sdnq.training import convert_training_model_to_sdnq
quantized_model = convert_training_model_to_sdnq(quantized_model)

Example code for quantized optimizer states:

from sdnq.optim import Adafactor, AdamW, CAME, Lion, Muon
optimizer = AdamW(
    parameters,
    use_stochastic_rounding=True,
    use_stochastic_buffers=True,
    use_quantized_buffers=True,
    use_svd_quantization=False,
    quantized_buffers_dtype="uint8",
    quantized_buffers_group_size=32,
    quantized_buffers_svd_rank=32,
)

Example code for quantized optimizer states for custom optimizers:

from sdnq.training import SDNQTensor

state["exp_avg"] = SDNQTensor.from_float(torch.zeros_like(p), weights_dtype="uint8", group_size=32, use_stochastic_rounding=True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sdnq-0.1.0.tar.gz (56.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sdnq-0.1.0-py3-none-any.whl (88.8 kB view details)

Uploaded Python 3

File details

Details for the file sdnq-0.1.0.tar.gz.

File metadata

  • Download URL: sdnq-0.1.0.tar.gz
  • Upload date:
  • Size: 56.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for sdnq-0.1.0.tar.gz
Algorithm Hash digest
SHA256 00713fbc90961bffda2708ab87db648492a3c6d4cfd0c44354526c56088a24db
MD5 3055887adc3215d5d94060aa23510e22
BLAKE2b-256 a92423535c99c5057d70477cc41b81988d66d3afc80d678cabc0b3e501dd9081

See more details on using hashes here.

File details

Details for the file sdnq-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: sdnq-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 88.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for sdnq-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 78586ddf4b8f165d446b05d55aacc58534ff9e2cf9c0ad4a91ecdcf6292b732a
MD5 f79f9d216660670de9a9db9037e4dabc
BLAKE2b-256 722e495171629ef213eb5de1e178efc1fd07b6ba48eb9ed9b2f0d2fdaaaa966c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page