Skip to main content

SD.Next Quantization Engine

Project description

SDNQ: SD.Next Quantization Engine

SD.Next Quantization provides full cross-platform quantization to reduce memory usage and increase performance for any device.
For more info, please check out SD.Next SDNQ wiki page: https://github.com/vladmandic/sdnext/wiki/SDNQ-Quantization

Install command:

pip install sdnq

Example code to load pre-quantized models:

Pre-quantized models can be found here: https://huggingface.co/collections/Disty0/sdnq

import torch
from sdnq import SDNQConfig # import sdnq to register it into diffusers and transformers
pipe_or_quantized_model = AutoModel.from_pretrained(model_path, torch_dtype=torch.bfloat16)

Example code for enabling or disabling quantized matmul with a pre-quantized model:

from sdnq.loader import apply_sdnq_options_to_model
quantized_model = apply_sdnq_options_to_model(quantized_model, use_quantized_matmul=True)

Example quantization config code for Diffusers and Transformers libraries:

from sdnq import SDNQConfig
from sdnq.common import use_torch_compile as triton_is_available

sdnq_config = SDNQConfig(
    weights_dtype="int8", # Check out `sdnq.common.accepted_weight_dtypes` for all the supported dtypes.
    quantized_matmul_dtype=None, # overrides the quantized matmul dtype to be different than weights_dtype format.  
    group_size=0, # 0 means auto, -1 means disabled
    svd_rank=32,
    svd_steps=8,
    dynamic_loss_threshold=1e-2,
    use_svd=False,
    quant_conv=False,
    use_quantized_matmul=triton_is_available,
    use_quantized_matmul_conv=False,
    use_dynamic_quantization=False,
    dequantize_fp32=True,
    non_blocking=False,
    add_skip_keys=True,
    quantization_device="cuda",
    return_device="cuda",
    modules_to_not_convert=["correction_coefs", "prediction_coefs", "lm_head", "embedding_projection"],
    modules_dtype_dict={"int8": ["lm_head"]},
)

quantized_model = AutoModel.from_pretrained(model_path, quantization_config=sdnq_config)

Example code for saving a quantized Diffusers or Transformers model:

pipe_or_quantized_model.save_pretrained("path_to_save_the_quantized_model")

Example quantization code for post load quantization on any model:

from sdnq import sdnq_post_load_quant

model = sdnq_post_load_quant(
    model,
    **kwargs_are_the_same_as_SDNQConfig,
)

Example code for quantized training:

Note:

  • Safetensors serialization is not supported with SDNQ training.
    Either don't use Safetensors serialization or convert the quantized model to standard SDNQ model before saving.
    You can also use scripts/dequantize_sdnq_training.py to dequantize an SDNQ Training model saved to the disk.
from sdnq.training import sdnq_training_post_load_quant
from sdnq.common import use_torch_compile as triton_is_available

quantized_model = sdnq_training_post_load_quant(
    model,
    weights_dtype="uint8",
    quantized_matmul_dtype="int8",
    group_size=32, # 0 means auto, -1 means disabled
    svd_rank=32,
    svd_steps=8,
    use_svd=False,
    use_grad_ckpt=True, # disable this if you are not using gradient checkpointing
    use_quantized_matmul=triton_is_available,
    use_static_quantization=True, # quantize the model weights
    use_stochastic_rounding=True,
    dequantize_fp32=True,
    non_blocking=False,
    add_skip_keys=True,
    quantization_device="cuda",
    return_device="cuda",
    modules_to_not_convert=["correction_coefs", "prediction_coefs", "lm_head", "embedding_projection"],
    modules_dtype_dict={"int8": ["lm_head"]},
)

Example code for converting standard SDNQ model to training SDNQ Model:

from sdnq.training import convert_sdnq_model_to_training
from sdnq.common import use_torch_compile as triton_is_available
quantized_model = convert_sdnq_model_to_training(
    quantized_model,
    quantized_matmul_dtype="int8",
    use_grad_ckpt=True,
    use_quantized_matmul=triton_is_available,
    use_stochastic_rounding=True,
    dequantize_fp32=True,
)

Example code for converting training SDNQ model to standard SDNQ Model:

from sdnq.training import convert_training_model_to_sdnq
quantized_model = convert_training_model_to_sdnq(quantized_model)

Example code for quantized optimizer states:

from sdnq.optim import Adafactor, AdamW, CAME, Lion, Muon
optimizer = AdamW(
    parameters,
    use_quantized_buffers=True,
    quantized_buffers_dtype="uint8",
    quantized_buffers_group_size=32,
    quantized_buffers_svd_rank=32,
    final_norm_mode="clip", # can be one of ["none", "clip", "rms", "rms_clip", "relative", "muon"]
    use_kahan=False,
    use_cautious=False,
    use_stochastic_rounding=True,
    use_stochastic_buffers=True,
    use_svd_quantization=False,
    use_torch_compile=False,
    offload_buffers=False,
    offload_non_blocking=True,
)

Example code for quantized optimizer states for custom optimizers or Tensors:

from sdnq.training import SDNQTensor

state["exp_avg"] = SDNQTensor.from_float(torch.zeros_like(p), weights_dtype="uint8", group_size=32, use_stochastic_rounding=True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sdnq-0.1.6.tar.gz (68.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sdnq-0.1.6-py3-none-any.whl (101.1 kB view details)

Uploaded Python 3

File details

Details for the file sdnq-0.1.6.tar.gz.

File metadata

  • Download URL: sdnq-0.1.6.tar.gz
  • Upload date:
  • Size: 68.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sdnq-0.1.6.tar.gz
Algorithm Hash digest
SHA256 1f10c85c34ca02624a374b8b8bbff091df50fd6e0fb5c7b92b12fa5bdd249c9c
MD5 16477bca06f29636749c2769c7857c4f
BLAKE2b-256 7efbaea404eb7488869aaeb80865463681591a9e40bebe3aaf6faab18c842353

See more details on using hashes here.

Provenance

The following attestation bundles were made for sdnq-0.1.6.tar.gz:

Publisher: python-publish.yml on Disty0/sdnq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sdnq-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: sdnq-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 101.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sdnq-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 3b33a9c84a08e05342e789a9382ac7d2327d13976cd4415d08261fe9d86f1c24
MD5 898572eb21e17c4424ff0ea65dce4e7d
BLAKE2b-256 aa1360becd4e6d51a64a81a700a261be937962d75601589c170099be77d6b144

See more details on using hashes here.

Provenance

The following attestation bundles were made for sdnq-0.1.6-py3-none-any.whl:

Publisher: python-publish.yml on Disty0/sdnq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page