Automatic monkey-patches for quantized model fine-tuning — fixes QLoRA issues with custom architectures
Project description
qpatch
Automatic fixes for quantized model fine-tuning.
qpatch monkey-patches known incompatibilities when training 4-bit/8-bit quantized models with LoRA adapters — especially on models with custom CUDA kernels (Mamba, MoE, RWKV, xLSTM).
The Problem
QLoRA training fails on many non-standard architectures with cryptic errors:
AttributeError: 'NoneType' object has no attribute 'get' # safetensors metadata
RuntimeError: expected mat1 and mat2 to have the same dtype # uint8 vs float16
RuntimeError: mat1 and mat2 shapes cannot be multiplied # fused kernels + 4-bit
RuntimeError: index_add_(): self (BFloat16) and source (Float) # MoE dtype mismatch
RuntimeError: "fused_dropout" not implemented for 'Byte' # dropout on uint8
The Fix
import qpatch
qpatch.patch_all()
# That's it. Now train normally.
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model
model = AutoModelForCausalLM.from_pretrained(
"nvidia/Nemotron-3-Nano-30B",
quantization_config=BitsAndBytesConfig(load_in_4bit=True),
device_map="auto",
trust_remote_code=True,
)
model = get_peft_model(model, LoraConfig(r=32, target_modules=["up_proj", "down_proj"]))
trainer.train() # Just works
What It Fixes
| Issue | Error | Affected Models |
|---|---|---|
| Safetensors metadata | 'NoneType' has no attribute 'get' |
Any model with metadata-less safetensors |
| LoRA dtype cast | unsigned char != c10::Half |
Any 4-bit model with LoRA |
| MoE dtype mismatch | index_add_(): BFloat16 and Float |
Nemotron, Mixtral, any MoE |
| Fused kernel bypass | shapes cannot be multiplied |
Mamba, Nemotron-H, hybrid architectures |
Install
pip install qpatch
Individual Patches
Apply only what you need:
import qpatch
qpatch.patch_safetensors_metadata() # Fix None metadata in safetensors
qpatch.patch_lora_dtype_cast() # Cast uint8 inputs to float16
qpatch.patch_moe_dtype_mismatch() # Auto-cast MoE index_add_ dtypes
qpatch.patch_fused_kernel_bypass() # Skip fused kernels for quantized models
GPU Architecture Notes
- Volta (V100, GV100): Use
qpatch.patch_all(compute_dtype=torch.float16)— Volta does not support bf16 - Ampere+ (A100, H100): Use
qpatch.patch_all(compute_dtype=torch.bfloat16)
How It Works
qpatch applies targeted monkey-patches at import time:
patch_safetensors_metadata— Wrapstransformers.modeling_utils.load_state_dictto detect None metadata and fall back tosafetensors.torch.load_filepatch_lora_dtype_cast— Wrapspeft.tuners.lora.bnb.Linear4bit.forwardto cast uint8 inputs to the compute dtypepatch_moe_dtype_mismatch— Wrapstorch.Tensor.index_add_to auto-cast source tensors when dtypes mismatchpatch_fused_kernel_bypass— Scans HuggingFace's cached model code and replaces fused-path conditions withif False:to force the quantization-safe slow path
All patches are idempotent — calling patch_all() multiple times is safe.
Tested With
- Nemotron-3-Nano-30B (Mamba hybrid)
- Transformers 4.46–5.3
- PEFT 0.12–0.18
- bitsandbytes 0.42–0.49
- PyTorch 2.5–2.10
- CUDA 12.2–12.8
- Volta (GV100), Pascal (P100), Ampere (A100)
License
MIT
Citation
@software{bond2026qpatch,
author = {Bond, Andrew H.},
title = {qpatch: Automatic fixes for quantized model fine-tuning},
year = {2026},
url = {https://github.com/ahb-sjsu/qpatch},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file qpatch-0.2.0.tar.gz.
File metadata
- Download URL: qpatch-0.2.0.tar.gz
- Upload date:
- Size: 15.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d25d7ff1ce6df31967d7398038156dfae8d33126b0e2a518891ad3b0144d56c5
|
|
| MD5 |
28f70edd34020b62755faa289824d2f0
|
|
| BLAKE2b-256 |
b8208cd3ac2b83fc45fcb040c131d6ac6a9899df4aac185fa15e4af459769def
|
File details
Details for the file qpatch-0.2.0-py3-none-any.whl.
File metadata
- Download URL: qpatch-0.2.0-py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3ae6e9eda4701444c814f19e24580d6a5a3802274b7b073898305d58072ee31
|
|
| MD5 |
c49b897010d05413652184677e3c0922
|
|
| BLAKE2b-256 |
9f78dedf6710167ff5816f791488aa0d7ee90179a0e50a2ede0c3a7647141ed1
|