Convert safetensors weights to quantized formats (FP8, INT8) with learned rounding optimization

These details have not been verified by PyPI

Project links

Repository

Project description

convert_to_quant

Convert safetensors weights to quantized formats (FP8, INT8) with learned rounding optimization for ComfyUI inference.

Installation

[!IMPORTANT] PyTorch must be installed first with the correct CUDA version for your GPU. This package does not install PyTorch automatically to avoid conflicts with your existing setup.

Step 1: Install PyTorch (GPU-specific)

Visit pytorch.org to get the correct install command for your system.

Examples:

# CUDA 13.0 (newest)
pip install torch --index-url https://download.pytorch.org/whl/cu130

# CUDA 12.8 (stable)
pip install torch --index-url https://download.pytorch.org/whl/cu128

# CUDA 12.6
pip install torch --index-url https://download.pytorch.org/whl/cu126

# CPU only (no GPU acceleration)
pip install torch --index-url https://download.pytorch.org/whl/cpu

Step 2: Install convert_to_quant

# Install from PyPI (when available)
pip install convert_to_quant

# Or install from source
git clone https://github.com/silveroxides/convert_to_quant.git
cd convert_to_quant
pip install -e .

Optional: Triton (needed for INT8)

On Linux
pip install -U triton

On Windows
for torch>=2.9
pip install -U "triton-windows<3.6"
for torch>=2.8
pip install -U "triton-windows<3.5"
for torch>=2.7
pip install -U "triton-windows<3.4"
for torch>=2.6
pip install -U "triton-windows<3.3"

Quick Start

# Basic FP8 quantization
convert_to_quant -i model.safetensors

# FP8 with ComfyUI metadata (recommended)
convert_to_quant -i model.safetensors --comfy_quant

# With custom learning rate (adaptive schedule by default)
convert_to_quant -i model.safetensors --comfy_quant --lr 0.01

# With plateau LR schedule for better convergence
convert_to_quant -i model.safetensors --comfy_quant --lr_schedule plateau --lr_patience 9 --lr_factor 0.92

Load the output .safetensors file in ComfyUI like any other model.

Supported Quantization Formats

Format	CLI Flag	Hardware	Optimization
FP8 (E4M3)	(default)	Ada/Hopper+	Learned Rounding (SVD)
INT8 Block-wise	`--int8`	Any GPU	Learned Rounding (SVD)
INT8 Tensor-wise	`--int8 --scaling_mode tensor`	Any GPU	High-perf `_scaled_mm`
NVFP4 (4-bit)	`--nvfp4`	Blackwell	Dual-scale optimization
MXFP8	`--mxfp8`	Blackwell	Microscaling (E8M0)

For a deep dive into how these formats work and their technical implementation, see FORMATS.md.

Model-Specific Presets

Model	Flag	Notes
Flux.2	`--flux2`	Keep modulation/guidance/time/final high-precision
Chroma / Radiance	`--distillation_large` / `--nerf_large`	Distillation layers excluded
T5-XXL Text Encoder	`--t5xxl`	Decoder removed
Mistral Text Encoder	`--mistral`	Norms/biases excluded
Visual Encoder	`--visual`	MLP layers excluded
Hunyuan Video	`--hunyuan`	Attention norms excluded
WAN Video	`--wan`	Time embeddings excluded
Qwen Image	`--qwen`	Image layers excluded
Z-Image	`--zimage` / `--zimage_refiner`	Refiner excludes context/noise refiner

Documentation

📖 MANUAL.md - Complete usage guide with examples and troubleshooting
📚 FORMATS.md - Technical reference for quantization formats and SVD optimization
📋 AGENTS.md - Developer guide & registry architecture
✨ ACTIVE.md - Current status and active implementations
🧪 DEVELOPMENT.md - Changelog and research notes
🔗 quantization.examples.md - ComfyUI integration patterns

Project Structure

convert_to_quant/
├── convert_to_quant/            # Main package
│   ├── cli/                     # CLI entry point & argument parsing
│   ├── converters/              # Core quantization logic (FP8, INT8, NVFP4)
│   ├── formats/                 # Format-specific conversion flows
│   ├── comfy/                   # ComfyUI integration components
│   ├── config/                  # Layer configuration & templates
│   ├── utils/                   # Shared utilities (tensor, memory)
│   ├── constants.py             # Model Filter Registry & constants
│   └── convert_to_quant.py      # Backward-compatibility wrapper
├── pyproject.toml               # Package configuration
├── MANUAL.md                    # User documentation
└── ...

Key Features

Learned Rounding: SVD-based optimization minimizes quantization error in weight's principal directions
Multiple Optimizers: Original (adaptive LR), AdamW, RAdam
Bias Correction: Automatic bias adjustment using synthetic calibration data
Model-Specific Support: Exclusion lists for sensitive layers (norms, embeddings, distillation)
Triton Kernels: GPU-accelerated quantization/dequantization with fallback to PyTorch
Three-Tier Quantization: Mix different formats per layer using --custom-layers and --fallback
Layer Config JSON: Fine-grained per-layer control with regex pattern matching
LR Schedules: Adaptive, exponential, and plateau learning rate scheduling

Advanced Usage

Layer Config JSON

Define per-layer quantization settings with regex patterns:

# Generate a template from your model
convert_to_quant -i model.safetensors --dry-run --layer-config-template layers.json

# Apply custom layer config
convert_to_quant -i model.safetensors --layer-config layers.json --comfy_quant

Scaling Modes

# Tensor-wise scaling (default)
convert_to_quant -i model.safetensors --scaling-mode tensor --comfy_quant

# Block-wise scaling for better accuracy
convert_to_quant -i model.safetensors --scaling-mode block --block_size 64 --comfy_quant

Additional Help

# View experimental features
convert_to_quant --help-experimental

# View model-specific filter presets
convert_to_quant --help-filters

Usage Examples

INT8 with performance heuristics

convert_to_quant -i model.safetensors --int8 --block_size 128 --comfy_quant --heur

Blackwell NVFP4 (4-bit)

convert_to_quant -i model.safetensors --nvfp4 --comfy_quant

Requirements

Python 3.9+
PyTorch 2.1+ (with CUDA for GPU acceleration)
safetensors >= 0.4.2
tqdm
(Optional) triton >= 2.1.0 for INT8 kernels

Acknowledgements

Special thanks to:

Clybius – For inspiring me to take on quantization and his Learned-Rounding repository.
lyogavin – For ComfyUI PR #10864 adding int8_blockwise format support and int8 kernels.

References

DeepSeek scaled FP8 matmul: https://github.com/deepseek-ai/DeepSeek-V3
JetFire paper: https://arxiv.org/abs/2403.12422

License

MIT License

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

1.2.0

Apr 26, 2026

1.1.5

Apr 3, 2026

1.1.4

Mar 6, 2026

1.1.3

Mar 6, 2026

1.1.2

Feb 21, 2026

1.1.1

Feb 13, 2026

1.1.0

Jan 31, 2026

1.0.7

Jan 30, 2026

1.0.6

Jan 29, 2026

1.0.5

Jan 29, 2026

1.0.4

Jan 25, 2026

1.0.3

Jan 25, 2026

1.0.2

Jan 24, 2026

This version

1.0.1

Jan 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

convert_to_quant-1.0.1.tar.gz (117.6 kB view details)

Uploaded Jan 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

convert_to_quant-1.0.1-py3-none-any.whl (133.1 kB view details)

Uploaded Jan 24, 2026 Python 3

File details

Details for the file convert_to_quant-1.0.1.tar.gz.

File metadata

Download URL: convert_to_quant-1.0.1.tar.gz
Upload date: Jan 24, 2026
Size: 117.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for convert_to_quant-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`0f4aea7e0534110e1f41aa39c72bc77f20b2cdf2c3f12d73fc4e25f03248851e`
MD5	`27c613de6106e3ea7ec568ace2b525af`
BLAKE2b-256	`348b28034fdeb76f457e42c9d64d2388e8d2b1f860d70c8d85658e1527ba64c4`

See more details on using hashes here.

File details

Details for the file convert_to_quant-1.0.1-py3-none-any.whl.

File metadata

Download URL: convert_to_quant-1.0.1-py3-none-any.whl
Upload date: Jan 24, 2026
Size: 133.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for convert_to_quant-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ebabab83cdb00b29958264bb095e8116bdf3b03d5055d88ffd40a17f538c8855`
MD5	`3c8d3c665185176275ea635df4b67963`
BLAKE2b-256	`df1fb1c4292bb5f2cf766314a883550226e186f6271c11c30402cbf1ca6d7fe9`

See more details on using hashes here.

convert-to-quant 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

convert_to_quant

Installation

Step 1: Install PyTorch (GPU-specific)

Step 2: Install convert_to_quant

Optional: Triton (needed for INT8)

Quick Start

Supported Quantization Formats

Model-Specific Presets

Documentation

Project Structure

Key Features

Advanced Usage

Layer Config JSON

Scaling Modes

Additional Help

Usage Examples

INT8 with performance heuristics

Blackwell NVFP4 (4-bit)

Requirements

Acknowledgements

References

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes