ParoQuant — Pairwise Rotation Quantization for LLMs
Project description
ParoQuant
Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
State-of-the-art INT4 quantization for LLMs. ParoQuant uses learned pairwise rotations to suppress weight outliers, closing the accuracy gap with FP16 while running at near-AWQ speed. Supports NVIDIA GPUs (vLLM, Transformers) and Apple Silicon (MLX).
Quick Start
NVIDIA GPU:
pip install "paroquant[vllm]"
python -m paroquant.cli.chat --model z-lab/Qwen3-8B-PARO
# or with Docker
docker run --pull=always --rm -it --gpus all --ipc=host \
ghcr.io/z-lab/paroquant:chat --model z-lab/Qwen3-8B-PARO
Apple Silicon:
pip install "paroquant[mlx]"
python -m paroquant.cli.chat --model z-lab/Qwen3-8B-PARO
Models
All models are available on Hugging Face. Swap the model name in the commands above to try any of them.
Qwen3
| Model | Checkpoint |
|---|---|
| Qwen3-0.6B | z-lab/Qwen3-0.6B-PARO |
| Qwen3-1.7B | z-lab/Qwen3-1.7B-PARO |
| Qwen3-4B | z-lab/Qwen3-4B-PARO |
| Qwen3-8B | z-lab/Qwen3-8B-PARO |
| Qwen3-14B | z-lab/Qwen3-14B-PARO |
| Qwen3-4B-Thinking-2507 | z-lab/Qwen3-4B-Thinking-2507-PARO |
Llama
| Model | Checkpoint |
|---|---|
| Llama-2-7B | z-lab/Llama-2-7b-hf-PARO |
| Llama-3-8B | z-lab/Meta-Llama-3-8B-PARO |
| Llama-3-70B | z-lab/Meta-Llama-3-70B-PARO |
| Llama-3.1-8B-Instruct | z-lab/Llama-3.1-8B-Instruct-PARO |
Want a model that's not listed? Open an issue and let us know.
Installation
git clone https://github.com/z-lab/paroquant && cd paroquant
pip install -e ".[vllm]" # vLLM backend (GPU, recommended)
pip install -e ".[transformers]" # Transformers backend (GPU)
pip install -e ".[mlx]" # MLX backend (Apple Silicon)
pip install -e ".[optim,eval]" # Optimization & evaluation
Or use Docker: docker run -it --gpus all --ipc=host ghcr.io/z-lab/paroquant:latest
Quantize Your Own Model
# 1. Optimize rotation parameters
experiments/optimize/4bit.sh Qwen/Qwen3-8B
# 2. Export to HF checkpoint (--mode real for INT4, --mode pseudo for FP16)
python -m paroquant.cli.convert \
--model Qwen/Qwen3-8B \
--result-dir output/Qwen3-8B \
--output-path models/Qwen3-8B-PARO
Reproduction
See experiments/README.md for scripts to reproduce all results in the paper.
Docker Images
| Image | Purpose |
|---|---|
ghcr.io/z-lab/paroquant:latest |
Optimization & evaluation |
ghcr.io/z-lab/paroquant:chat |
Interactive chat |
ghcr.io/z-lab/paroquant:chat-cu130 |
Interactive chat (CUDA 13.0 / ARM64) |
ghcr.io/z-lab/paroquant:eval-reasoning |
Reasoning task evaluation |
Citation
@inproceedings{liang2026paroquant,
title = {{ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference}},
author = {Liang, Yesheng and Chen, Haisheng and Zhang, Zihan and Han, Song and Liu, Zhijian},
booktitle = {International Conference on Learning Representations (ICLR)},
year = {2026}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file paroquant-0.1.3.tar.gz.
File metadata
- Download URL: paroquant-0.1.3.tar.gz
- Upload date:
- Size: 40.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53dcf0c65c7d7ced1072a2e684d744dca4211c5cc0b0ef0b992423d39cd6c89e
|
|
| MD5 |
54b81b796ce0c91f6605d31f40a63a52
|
|
| BLAKE2b-256 |
ef211ca45675ce83c356853a6f95154a72874b3f67889a9c3a6d4d2bf26bf1e3
|
File details
Details for the file paroquant-0.1.3-py3-none-any.whl.
File metadata
- Download URL: paroquant-0.1.3-py3-none-any.whl
- Upload date:
- Size: 49.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
22cd0b479b74383bd95d1903fb60b83cbec05608d6470289e0aa3ad1c65c446b
|
|
| MD5 |
06afafe0fd7cc2f49719a9ecd45f231b
|
|
| BLAKE2b-256 |
e6b1068421752c96a30eb48472afd6a9d32529a025efd57d52e712438445c865
|