Skip to main content

Convert safetensors weights to quantized formats (FP8, INT8) with learned rounding optimization

Project description

convert_to_quant

Convert safetensors weights to quantized formats (FP8, INT8, NVFP4, MXFP8) with learned rounding optimization for ComfyUI inference.

PyPI version GitHub release Python 3.10+ License: MIT


Installation

pip install convert-to-quant

Or install from source:

git clone https://github.com/silveroxides/convert_to_quant.git
cd convert_to_quant
pip install -e .

Requirements Summary

Feature Requirement
Minimum (FP8/INT8) Python 3.10+, PyTorch 2.8+, CUDA 12.8+
Full (NVFP4/MXFP8) Python 3.12+, PyTorch 2.10+, CUDA 13.0+, comfy-kitchen
INT8 Kernels Triton (Linux native, Windows via triton-windows)

[!IMPORTANT] PyTorch must be installed manually with the correct CUDA version for your GPU. This package does not install PyTorch automatically to prevent environment conflicts.


Detailed Installation (GPU-Specific)

1. Install PyTorch

Visit pytorch.org to get the correct install command.

Examples:

# CUDA 13.0 (Required for Blackwell NVFP4/MXFP8)
pip install torch --index-url https://download.pytorch.org/whl/cu130

# CUDA 12.8 (Stable)
pip install torch --index-url https://download.pytorch.org/whl/cu128

# CPU only
pip install torch --index-url https://download.pytorch.org/whl/cpu

2. Optional: Triton (needed for INT8)

# Linux
pip install -U triton

# Windows for torch 2.10 and 2.11
pip install -U "triton-windows<3.7"
# Windows for torch 2.12
pip install -U "triton-windows<3.8"

Quick Start

Use the command 'ctq -hf' to view arguments for layer exclusion presets for various models

# All examples include metadata and comfy_quant layers for ComfyUI compatible quantization.
# Examples utilize low memory overhead argument to reduce peak RAM/VRAM usage.

# Basic FP8 Tensorcore quantization without learned rounding
ctq -i model.safetensors -o model-fp8mixed.safetensors --comfy_quant --save-quant-metadata --simple --low-memory

# INT8 Row-Wise quantization without learned rounding
ctq -i model.safetensors -o model-int8mixedrow.safetensors --int8 --scaling_mode row --comfy_quant --save-quant-metadata --simple --low-memory

# Blackwell MXFP8 quantization without learned rounding
ctq -i model.safetensors -o model-mxfp8mixed.safetensors --mxfp8 --comfy_quant --save-quant-metadata --simple --low-memory

Use In Code As Module

# Example modular usage of INT8 Row-Wise quantization of Flux2 Klein 9B
from convert_to_quant import quantize

quantize(
    input="./flux-2-klein-9b.safetensors",
    output="./flux-2-klein-9b-int8mixedrow.safetensors",
    comfy_quant=True,
    save_quant_metadata=True,
    verbose="VERBOSE",
    low_memory=True,
    int8=True,
    scaling_mode="row",
    flux2=True,
    simple=True,
    calib_samples=8192
)

Load the output .safetensors file in ComfyUI like any other model.


Supported Quantization Formats

Format CLI Flag Hardware Optimization
FP8 (E4M3) (default) Ada/Hopper+ Learned Rounding (SVD)
INT8 Block-wise --int8 Any GPU Learned Rounding (SVD)
INT8 Tensor-wise --int8 --scaling_mode tensor Any GPU High-perf _scaled_mm
NVFP4 (4-bit) --nvfp4 Blackwell Dual-scale optimization
MXFP8 --mxfp8 Blackwell Microscaling (E8M0)

For a deep dive into how these formats work, see FORMATS.md.


Model-Specific Presets

Model Flag Notes
Flux.2 --flux2 Keep modulation/guidance/time/final high-precision
T5-XXL --t5xxl Decoder removed
Hunyuan Video --hunyuan Attention norms excluded
WAN Video --wan Time embeddings excluded

(See --help-filters for a full list of presets)


Documentation

  • 📖 MANUAL.md - Complete usage guide with examples and troubleshooting
  • 📚 FORMATS.md - Technical reference for quantization formats
  • 🧪 DEVELOPMENT.md - Changelog and research notes
  • 📋 AGENTS.md - Developer guide & registry architecture

Key Features

  • Learned Rounding: SVD-based optimization minimizes quantization error.
  • Bias Correction: Automatic bias adjustment using synthetic calibration data.
  • Model-Specific Support: Exclusion lists for sensitive layers (norms, embeddings).
  • Three-Tier Quantization: Mix different formats per layer using --custom-layers.

Advanced Usage

Layer Config JSON

Define per-layer settings with regex patterns:

convert_to_quant -i model.safetensors --layer-config layers.json --comfy_quant

Scaling Modes

# Block-wise scaling for better accuracy
convert_to_quant -i model.safetensors --scaling-mode block --block_size 64 --comfy_quant

Acknowledgements

Special thanks to:


License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

convert_to_quant-1.2.6.tar.gz (129.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

convert_to_quant-1.2.6-py3-none-any.whl (140.7 kB view details)

Uploaded Python 3

File details

Details for the file convert_to_quant-1.2.6.tar.gz.

File metadata

  • Download URL: convert_to_quant-1.2.6.tar.gz
  • Upload date:
  • Size: 129.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for convert_to_quant-1.2.6.tar.gz
Algorithm Hash digest
SHA256 010ffef61daae31dbc9b0d966bdbdc51c4850602d0cbe070ac9aeb6acffd1c6b
MD5 2bb454da93139afbcc7ab07b9c322039
BLAKE2b-256 8de73e4d4d3b6cd52aca97c67ccee3a3c99da7ff85e5bc37d083ee7be69e7f8e

See more details on using hashes here.

File details

Details for the file convert_to_quant-1.2.6-py3-none-any.whl.

File metadata

File hashes

Hashes for convert_to_quant-1.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 a20e67d644e59515de4c9cd1dfb24985ea769c45f6e3942dbaf1d4b6649c9d56
MD5 0fa32eac7d93e3b7b39f5f046572e24f
BLAKE2b-256 bd943f40b9560d85000b89f6278b870737a73b6327bec85e253ebfc2a79cb84d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page