Skip to main content

ParoQuant — Pairwise Rotation Quantization for LLMs

Project description

ParoQuant

Pairwise Rotation Quantization for Efficient Reasoning LLM Inference

Paper Blog Models PyPI

State-of-the-art INT4 quantization for LLMs. ParoQuant uses learned pairwise rotations to suppress weight outliers, closing the accuracy gap with FP16 while running at near-AWQ speed. Supports NVIDIA GPUs (vLLM, Transformers) and Apple Silicon (MLX).

Quick Start

NVIDIA GPU:

pip install "paroquant[vllm]"
python -m paroquant.cli.chat --model z-lab/Qwen3-8B-PARO

# or with Docker
docker run --pull=always --rm -it --gpus all --ipc=host \
  ghcr.io/z-lab/paroquant:chat --model z-lab/Qwen3-8B-PARO

Apple Silicon:

pip install "paroquant[mlx]"
python -m paroquant.cli.chat --model z-lab/Qwen3-8B-PARO

Models

All models are available on Hugging Face. Swap the model name in the commands above to try any of them.

Qwen3

Model Checkpoint
Qwen3-0.6B z-lab/Qwen3-0.6B-PARO
Qwen3-1.7B z-lab/Qwen3-1.7B-PARO
Qwen3-4B z-lab/Qwen3-4B-PARO
Qwen3-8B z-lab/Qwen3-8B-PARO
Qwen3-14B z-lab/Qwen3-14B-PARO
Qwen3-4B-Thinking-2507 z-lab/Qwen3-4B-Thinking-2507-PARO

Llama

Model Checkpoint
Llama-2-7B z-lab/Llama-2-7b-hf-PARO
Llama-3-8B z-lab/Meta-Llama-3-8B-PARO
Llama-3-70B z-lab/Meta-Llama-3-70B-PARO
Llama-3.1-8B-Instruct z-lab/Llama-3.1-8B-Instruct-PARO

Want a model that's not listed? Open an issue and let us know.

Installation

git clone https://github.com/z-lab/paroquant && cd paroquant

pip install -e ".[vllm]"            # vLLM backend (GPU, recommended)
pip install -e ".[transformers]"    # Transformers backend (GPU)
pip install -e ".[mlx]"             # MLX backend (Apple Silicon)
pip install -e ".[optim,eval]"      # Optimization & evaluation

Or use Docker: docker run -it --gpus all --ipc=host ghcr.io/z-lab/paroquant:latest

Quantize Your Own Model

# 1. Optimize rotation parameters
experiments/optimize/4bit.sh Qwen/Qwen3-8B

# 2. Export to HF checkpoint (--mode real for INT4, --mode pseudo for FP16)
python -m paroquant.cli.convert \
  --model Qwen/Qwen3-8B \
  --result-dir output/Qwen3-8B \
  --output-path models/Qwen3-8B-PARO

Reproduction

See experiments/README.md for scripts to reproduce all results in the paper.

Docker Images

Image Purpose
ghcr.io/z-lab/paroquant:latest Optimization & evaluation
ghcr.io/z-lab/paroquant:chat Interactive chat
ghcr.io/z-lab/paroquant:chat-cu130 Interactive chat (CUDA 13.0 / ARM64)
ghcr.io/z-lab/paroquant:eval-reasoning Reasoning task evaluation

Citation

@inproceedings{liang2026paroquant,
  title     = {{ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference}},
  author    = {Liang, Yesheng and Chen, Haisheng and Zhang, Zihan and Han, Song and Liu, Zhijian},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year      = {2026}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paroquant-0.1.2.tar.gz (40.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

paroquant-0.1.2-py3-none-any.whl (49.3 kB view details)

Uploaded Python 3

File details

Details for the file paroquant-0.1.2.tar.gz.

File metadata

  • Download URL: paroquant-0.1.2.tar.gz
  • Upload date:
  • Size: 40.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for paroquant-0.1.2.tar.gz
Algorithm Hash digest
SHA256 f1c39465fc7f6f9491dbbcd75255169a7ad1cc3192119bfeb000e543f0a68a9c
MD5 756e8bdc228f83a1b2226f0bbc981cad
BLAKE2b-256 6b1466d0175fb9905279e254302d0f353bf85a6e34948c55edc4522c2ad1caaa

See more details on using hashes here.

File details

Details for the file paroquant-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: paroquant-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 49.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for paroquant-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3a1b6d4f64c985e97de0654ae517c72d03f5288a8e7dd0f4cd6130927fbb92fe
MD5 0cce96ab4daf93fb41409cbc6afcd8fc
BLAKE2b-256 e8ed813b4a5e39ed59344a8fd0b21ccad60a5377630a7bf632b4bb1dacee7f71

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page