Skip to main content

ParoQuant — Pairwise Rotation Quantization for LLMs

Project description

ParoQuant

Pairwise Rotation Quantization for Efficient Reasoning LLM Inference

Paper Blog Models PyPI

State-of-the-art INT4 quantization for LLMs. ParoQuant uses learned pairwise rotations to suppress weight outliers, closing the accuracy gap with FP16 while running at near-AWQ speed. Supports NVIDIA GPUs (vLLM, Transformers) and Apple Silicon (MLX).

Quick Start

NVIDIA GPU:

pip install "paroquant[vllm]"
python -m paroquant.cli.chat --model z-lab/Qwen3-8B-PARO

# or with Docker
docker run --pull=always --rm -it --gpus all --ipc=host \
  ghcr.io/z-lab/paroquant:chat --model z-lab/Qwen3-8B-PARO

Apple Silicon:

pip install "paroquant[mlx]"
python -m paroquant.cli.chat --model z-lab/Qwen3-8B-PARO

Models

All models are available on Hugging Face. Swap the model name in the commands above to try any of them.

Qwen3

Model Checkpoint
Qwen3-0.6B z-lab/Qwen3-0.6B-PARO
Qwen3-1.7B z-lab/Qwen3-1.7B-PARO
Qwen3-4B z-lab/Qwen3-4B-PARO
Qwen3-8B z-lab/Qwen3-8B-PARO
Qwen3-14B z-lab/Qwen3-14B-PARO
Qwen3-4B-Thinking-2507 z-lab/Qwen3-4B-Thinking-2507-PARO

Llama

Model Checkpoint
Llama-2-7B z-lab/Llama-2-7b-hf-PARO
Llama-3-8B z-lab/Meta-Llama-3-8B-PARO
Llama-3-70B z-lab/Meta-Llama-3-70B-PARO
Llama-3.1-8B-Instruct z-lab/Llama-3.1-8B-Instruct-PARO

Want a model that's not listed? Open an issue and let us know.

Installation

git clone https://github.com/z-lab/paroquant && cd paroquant

pip install -e ".[vllm]"            # vLLM backend (GPU, recommended)
pip install -e ".[transformers]"    # Transformers backend (GPU)
pip install -e ".[mlx]"             # MLX backend (Apple Silicon)
pip install -e ".[optim,eval]"      # Optimization & evaluation

Or use Docker: docker run -it --gpus all --ipc=host ghcr.io/z-lab/paroquant:latest

Quantize Your Own Model

# 1. Optimize rotation parameters
experiments/optimize/4bit.sh Qwen/Qwen3-8B

# 2. Export to HF checkpoint (--mode real for INT4, --mode pseudo for FP16)
python -m paroquant.cli.convert \
  --model Qwen/Qwen3-8B \
  --result-dir output/Qwen3-8B \
  --output-path models/Qwen3-8B-PARO

Reproduction

See experiments/README.md for scripts to reproduce all results in the paper.

Docker Images

Image Purpose
ghcr.io/z-lab/paroquant:latest Optimization & evaluation
ghcr.io/z-lab/paroquant:chat Interactive chat
ghcr.io/z-lab/paroquant:chat-cu130 Interactive chat (CUDA 13.0 / ARM64)
ghcr.io/z-lab/paroquant:eval-reasoning Reasoning task evaluation

Citation

@inproceedings{liang2026paroquant,
  title     = {{ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference}},
  author    = {Liang, Yesheng and Chen, Haisheng and Zhang, Zihan and Han, Song and Liu, Zhijian},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year      = {2026}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paroquant-0.1.3.tar.gz (40.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

paroquant-0.1.3-py3-none-any.whl (49.3 kB view details)

Uploaded Python 3

File details

Details for the file paroquant-0.1.3.tar.gz.

File metadata

  • Download URL: paroquant-0.1.3.tar.gz
  • Upload date:
  • Size: 40.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for paroquant-0.1.3.tar.gz
Algorithm Hash digest
SHA256 53dcf0c65c7d7ced1072a2e684d744dca4211c5cc0b0ef0b992423d39cd6c89e
MD5 54b81b796ce0c91f6605d31f40a63a52
BLAKE2b-256 ef211ca45675ce83c356853a6f95154a72874b3f67889a9c3a6d4d2bf26bf1e3

See more details on using hashes here.

File details

Details for the file paroquant-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: paroquant-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 49.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for paroquant-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 22cd0b479b74383bd95d1903fb60b83cbec05608d6470289e0aa3ad1c65c446b
MD5 06afafe0fd7cc2f49719a9ecd45f231b
BLAKE2b-256 e6b1068421752c96a30eb48472afd6a9d32529a025efd57d52e712438445c865

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page