Automatic monkey-patches for quantized model fine-tuning — fixes QLoRA issues with custom architectures

These details have not been verified by PyPI

Project links

Project description

qpatch

Automatic fixes for quantized model fine-tuning.

qpatch monkey-patches known incompatibilities when training 4-bit/8-bit quantized models with LoRA adapters — especially on models with custom CUDA kernels (Mamba, MoE, RWKV, xLSTM).

The Problem

QLoRA training fails on many non-standard architectures with cryptic errors:

AttributeError: 'NoneType' object has no attribute 'get'          # safetensors metadata
RuntimeError: expected mat1 and mat2 to have the same dtype        # uint8 vs float16
RuntimeError: mat1 and mat2 shapes cannot be multiplied            # fused kernels + 4-bit
RuntimeError: index_add_(): self (BFloat16) and source (Float)     # MoE dtype mismatch
RuntimeError: "fused_dropout" not implemented for 'Byte'           # dropout on uint8

The Fix

import qpatch
qpatch.patch_all()

# That's it. Now train normally.
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model

model = AutoModelForCausalLM.from_pretrained(
    "nvidia/Nemotron-3-Nano-30B",
    quantization_config=BitsAndBytesConfig(load_in_4bit=True),
    device_map="auto",
    trust_remote_code=True,
)
model = get_peft_model(model, LoraConfig(r=32, target_modules=["up_proj", "down_proj"]))
trainer.train()  # Just works

What It Fixes

Issue	Error	Affected Models
Safetensors metadata	`'NoneType' has no attribute 'get'`	Any model with metadata-less safetensors
LoRA dtype cast	`unsigned char != c10::Half`	Any 4-bit model with LoRA
MoE dtype mismatch	`index_add_(): BFloat16 and Float`	Nemotron, Mixtral, any MoE
Fused kernel bypass	`shapes cannot be multiplied`	Mamba, Nemotron-H, hybrid architectures

Install

pip install qpatch

Individual Patches

Apply only what you need:

import qpatch

qpatch.patch_safetensors_metadata()   # Fix None metadata in safetensors
qpatch.patch_lora_dtype_cast()        # Cast uint8 inputs to float16
qpatch.patch_moe_dtype_mismatch()     # Auto-cast MoE index_add_ dtypes
qpatch.patch_fused_kernel_bypass()    # Skip fused kernels for quantized models

GPU Architecture Notes

Volta (V100, GV100): Use qpatch.patch_all(compute_dtype=torch.float16) — Volta does not support bf16
Ampere+ (A100, H100): Use qpatch.patch_all(compute_dtype=torch.bfloat16)

How It Works

qpatch applies targeted monkey-patches at import time:

patch_safetensors_metadata — Wraps transformers.modeling_utils.load_state_dict to detect None metadata and fall back to safetensors.torch.load_file
patch_lora_dtype_cast — Wraps peft.tuners.lora.bnb.Linear4bit.forward to cast uint8 inputs to the compute dtype
patch_moe_dtype_mismatch — Wraps torch.Tensor.index_add_ to auto-cast source tensors when dtypes mismatch
patch_fused_kernel_bypass — Scans HuggingFace's cached model code and replaces fused-path conditions with if False: to force the quantization-safe slow path

All patches are idempotent — calling patch_all() multiple times is safe.

Tested With

Nemotron-3-Nano-30B (Mamba hybrid)
Transformers 4.46–5.3
PEFT 0.12–0.18
bitsandbytes 0.42–0.49
PyTorch 2.5–2.10
CUDA 12.2–12.8
Volta (GV100), Pascal (P100), Ampere (A100)

License

MIT

Citation

@software{bond2026qpatch,
  author = {Bond, Andrew H.},
  title = {qpatch: Automatic fixes for quantized model fine-tuning},
  year = {2026},
  url = {https://github.com/ahb-sjsu/qpatch},
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Mar 23, 2026

0.1.0

Mar 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qpatch-0.2.0.tar.gz (15.0 kB view details)

Uploaded Mar 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

qpatch-0.2.0-py3-none-any.whl (10.6 kB view details)

Uploaded Mar 23, 2026 Python 3

File details

Details for the file qpatch-0.2.0.tar.gz.

File metadata

Download URL: qpatch-0.2.0.tar.gz
Upload date: Mar 23, 2026
Size: 15.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for qpatch-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`d25d7ff1ce6df31967d7398038156dfae8d33126b0e2a518891ad3b0144d56c5`
MD5	`28f70edd34020b62755faa289824d2f0`
BLAKE2b-256	`b8208cd3ac2b83fc45fcb040c131d6ac6a9899df4aac185fa15e4af459769def`

See more details on using hashes here.

File details

Details for the file qpatch-0.2.0-py3-none-any.whl.

File metadata

Download URL: qpatch-0.2.0-py3-none-any.whl
Upload date: Mar 23, 2026
Size: 10.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for qpatch-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c3ae6e9eda4701444c814f19e24580d6a5a3802274b7b073898305d58072ee31`
MD5	`c49b897010d05413652184677e3c0922`
BLAKE2b-256	`9f78dedf6710167ff5816f791488aa0d7ee90179a0e50a2ede0c3a7647141ed1`

See more details on using hashes here.

qpatch 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

qpatch

The Problem

The Fix

What It Fixes

Install

Individual Patches

GPU Architecture Notes

How It Works

Tested With

License

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes