Skip to main content

Convert safetensors weights to quantized formats (FP8, INT8) with learned rounding optimization

Project description

convert_to_quant

Convert safetensors weights to quantized formats (FP8, INT8) with learned rounding optimization for ComfyUI inference.

Python 3.9+ License: MIT


Installation

[!IMPORTANT] PyTorch must be installed first with the correct CUDA version for your GPU. This package does not install PyTorch automatically to avoid conflicts with your existing setup.

Step 1: Install PyTorch (GPU-specific)

Visit pytorch.org to get the correct install command for your system.

Examples:

# CUDA 13.0 (newest)
pip install torch --index-url https://download.pytorch.org/whl/cu130

# CUDA 12.8 (stable)
pip install torch --index-url https://download.pytorch.org/whl/cu128

# CUDA 12.6
pip install torch --index-url https://download.pytorch.org/whl/cu126

# CPU only (no GPU acceleration)
pip install torch --index-url https://download.pytorch.org/whl/cpu

Step 2: Install convert_to_quant

# Install from PyPI (when available)
pip install convert_to_quant

# Or install from source
git clone https://github.com/silveroxides/convert_to_quant.git
cd convert_to_quant
pip install -e .

Optional: Triton (needed for INT8)

On Linux
pip install -U triton

On Windows
for torch>=2.9
pip install -U "triton-windows<3.6"
for torch>=2.8
pip install -U "triton-windows<3.5"
for torch>=2.7
pip install -U "triton-windows<3.4"
for torch>=2.6
pip install -U "triton-windows<3.3"

Quick Start

# Basic FP8 quantization
convert_to_quant -i model.safetensors

# FP8 with ComfyUI metadata (recommended)
convert_to_quant -i model.safetensors --comfy_quant

# With custom learning rate (adaptive schedule by default)
convert_to_quant -i model.safetensors --comfy_quant --lr 0.01

# With plateau LR schedule for better convergence
convert_to_quant -i model.safetensors --comfy_quant --lr_schedule plateau --lr_patience 9 --lr_factor 0.92

Load the output .safetensors file in ComfyUI like any other model.


Supported Quantization Formats

Format CLI Flag Hardware Optimization
FP8 (E4M3) (default) Ada/Hopper+ Learned Rounding (SVD)
INT8 Block-wise --int8 Any GPU Learned Rounding (SVD)
INT8 Tensor-wise --int8 --scaling_mode tensor Any GPU High-perf _scaled_mm
NVFP4 (4-bit) --nvfp4 Blackwell Dual-scale optimization
MXFP8 --mxfp8 Blackwell Microscaling (E8M0)

For a deep dive into how these formats work and their technical implementation, see FORMATS.md.


Model-Specific Presets

Model Flag Notes
Flux.2 --flux2 Keep modulation/guidance/time/final high-precision
Chroma / Radiance --distillation_large / --nerf_large Distillation layers excluded
T5-XXL Text Encoder --t5xxl Decoder removed
Mistral Text Encoder --mistral Norms/biases excluded
Visual Encoder --visual MLP layers excluded
Hunyuan Video --hunyuan Attention norms excluded
WAN Video --wan Time embeddings excluded
Qwen Image --qwen Image layers excluded
Z-Image --zimage / --zimage_refiner Refiner excludes context/noise refiner

Documentation

  • 📖 MANUAL.md - Complete usage guide with examples and troubleshooting
  • 📚 FORMATS.md - Technical reference for quantization formats and SVD optimization
  • 📋 AGENTS.md - Developer guide & registry architecture
  • ACTIVE.md - Current status and active implementations
  • 🧪 DEVELOPMENT.md - Changelog and research notes
  • 🔗 quantization.examples.md - ComfyUI integration patterns

Project Structure

convert_to_quant/
├── convert_to_quant/            # Main package
│   ├── cli/                     # CLI entry point & argument parsing
│   ├── converters/              # Core quantization logic (FP8, INT8, NVFP4)
│   ├── formats/                 # Format-specific conversion flows
│   ├── comfy/                   # ComfyUI integration components
│   ├── config/                  # Layer configuration & templates
│   ├── utils/                   # Shared utilities (tensor, memory)
│   ├── constants.py             # Model Filter Registry & constants
│   └── convert_to_quant.py      # Backward-compatibility wrapper
├── pyproject.toml               # Package configuration
├── MANUAL.md                    # User documentation
└── ...

Key Features

  • Learned Rounding: SVD-based optimization minimizes quantization error in weight's principal directions
  • Multiple Optimizers: Original (adaptive LR), AdamW, RAdam
  • Bias Correction: Automatic bias adjustment using synthetic calibration data
  • Model-Specific Support: Exclusion lists for sensitive layers (norms, embeddings, distillation)
  • Triton Kernels: GPU-accelerated quantization/dequantization with fallback to PyTorch
  • Three-Tier Quantization: Mix different formats per layer using --custom-layers and --fallback
  • Layer Config JSON: Fine-grained per-layer control with regex pattern matching
  • LR Schedules: Adaptive, exponential, and plateau learning rate scheduling

Advanced Usage

Layer Config JSON

Define per-layer quantization settings with regex patterns:

# Generate a template from your model
convert_to_quant -i model.safetensors --dry-run --layer-config-template layers.json

# Apply custom layer config
convert_to_quant -i model.safetensors --layer-config layers.json --comfy_quant

Scaling Modes

# Tensor-wise scaling (default)
convert_to_quant -i model.safetensors --scaling-mode tensor --comfy_quant

# Block-wise scaling for better accuracy
convert_to_quant -i model.safetensors --scaling-mode block --block_size 64 --comfy_quant

Additional Help

# View experimental features
convert_to_quant --help-experimental

# View model-specific filter presets
convert_to_quant --help-filters

Usage Examples

INT8 with performance heuristics

convert_to_quant -i model.safetensors --int8 --block_size 128 --comfy_quant --heur

Blackwell NVFP4 (4-bit)

convert_to_quant -i model.safetensors --nvfp4 --comfy_quant

Requirements

  • Python 3.9+
  • PyTorch 2.1+ (with CUDA for GPU acceleration)
  • safetensors >= 0.4.2
  • tqdm
  • (Optional) triton >= 2.1.0 for INT8 kernels

Acknowledgements

Special thanks to:


References


License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

convert_to_quant-1.0.1.tar.gz (117.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

convert_to_quant-1.0.1-py3-none-any.whl (133.1 kB view details)

Uploaded Python 3

File details

Details for the file convert_to_quant-1.0.1.tar.gz.

File metadata

  • Download URL: convert_to_quant-1.0.1.tar.gz
  • Upload date:
  • Size: 117.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for convert_to_quant-1.0.1.tar.gz
Algorithm Hash digest
SHA256 0f4aea7e0534110e1f41aa39c72bc77f20b2cdf2c3f12d73fc4e25f03248851e
MD5 27c613de6106e3ea7ec568ace2b525af
BLAKE2b-256 348b28034fdeb76f457e42c9d64d2388e8d2b1f860d70c8d85658e1527ba64c4

See more details on using hashes here.

File details

Details for the file convert_to_quant-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for convert_to_quant-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ebabab83cdb00b29958264bb095e8116bdf3b03d5055d88ffd40a17f538c8855
MD5 3c8d3c665185176275ea635df4b67963
BLAKE2b-256 df1fb1c4292bb5f2cf766314a883550226e186f6271c11c30402cbf1ca6d7fe9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page