ParoQuant — Pairwise Rotation Quantization for LLMs

These details have not been verified by PyPI

Project links

Project description

ParoQuant

Pairwise Rotation Quantization for Efficient Reasoning LLM Inference

State-of-the-art INT4 quantization for LLMs. ParoQuant uses learned pairwise rotations to suppress weight outliers, closing the accuracy gap with FP16 while running at near-AWQ speed. Supports NVIDIA GPUs (vLLM, Transformers) and Apple Silicon (MLX).

Quick Start

Installation

# NVIDIA GPU (CUDA 12.9)
pip install "paroquant[vllm]"

# NVIDIA GPU (CUDA 13.0)
pip install "paroquant[vllm]" "vllm==0.19.1" \
  --extra-index-url https://wheels.vllm.ai/0.19.1/cu130 \
  --extra-index-url https://download.pytorch.org/whl/cu130

# Apple Silicon
pip install "paroquant[mlx]"

Pick a model from our Hugging Face collection:

export MODEL=z-lab/Qwen3.5-4B-PARO

Interactive Chat

python -m paroquant.cli.chat --model $MODEL

OpenAI-Compatible API Server

For vLLM, you can directly use vllm serve to serve ParoQuant models:

vllm serve $MODEL --port 8000

For other frameworks:

python -m paroquant.cli.serve --model $MODEL --port 8000

For MLX, add --vlm if you wish to load the VLM components and use the model's multimodal features. For vLLM, VLM components are loaded by default and can be skipped with the server argument --language-model-only.

Docker (NVIDIA GPU)

[!NOTE] The following commands map the local cache directory to the container in order to persist kernel cache across runs. Remove -v ... to disable this behaviour.

# Interactive chat
docker run --pull=always --rm -it --gpus all --ipc=host \
  -v $HOME/.cache/paroquant:/root/.cache/paroquant \
  ghcr.io/z-lab/paroquant:chat --model $MODEL

# API server (port 8000)
docker run --pull=always --rm -it --gpus all --ipc=host -p 8000:8000 \
  -v $HOME/.cache/paroquant:/root/.cache/paroquant \
  ghcr.io/z-lab/paroquant:serve --model $MODEL

Models

All models are available on Hugging Face. Swap the model name in the commands above to try any of them.

Gemma 4

Model	Checkpoint
gemma-4-31B-it	`z-lab/gemma-4-31B-it-PARO`
gemma-4-E2B-it	`z-lab/gemma-4-E2B-it-PARO`

Qwen3.6

Model	Checkpoint
Qwen3.6-27B	`z-lab/Qwen3.6-27B-PARO`

Qwen3.5

Model	Checkpoint
Qwen3.5-0.8B	`z-lab/Qwen3.5-0.8B-PARO`
Qwen3.5-2B	`z-lab/Qwen3.5-2B-PARO`
Qwen3.5-4B	`z-lab/Qwen3.5-4B-PARO`
Qwen3.5-9B	`z-lab/Qwen3.5-9B-PARO`
Qwen3.5-27B	`z-lab/Qwen3.5-27B-PARO`
Qwen3.5-35B-A3B	`z-lab/Qwen3.5-35B-A3B-PARO`

Qwen3

Model	Checkpoint
Qwen3-0.6B	`z-lab/Qwen3-0.6B-PARO`
Qwen3-1.7B	`z-lab/Qwen3-1.7B-PARO`
Qwen3-4B	`z-lab/Qwen3-4B-PARO`
Qwen3-8B	`z-lab/Qwen3-8B-PARO`
Qwen3-14B	`z-lab/Qwen3-14B-PARO`

Llama

Model	Checkpoint
Llama-2-7B	`z-lab/Llama-2-7b-hf-PARO`
Llama-3-8B	`z-lab/Meta-Llama-3-8B-PARO`
Llama-3.1-8B-Instruct	`z-lab/Llama-3.1-8B-Instruct-PARO`

Want a model that's not listed? Open an issue and let us know.

Reproduction

[!NOTE] The main branch of this repository is under active development, and reproducibility is not guaranteed. Please use the legacy branch to reproduce results from the paper.

Quantize Your Own Model

git clone https://github.com/z-lab/paroquant && cd paroquant
pip install -e ".[optim,eval]"

# 1. Optimize rotation parameters
experiments/optimize/4bit.sh Qwen/Qwen3-8B

# 2. Export to HF checkpoint (--mode real for INT4, --mode pseudo for FP16)
python -m paroquant.cli.convert \
  --model Qwen/Qwen3-8B \
  --result-dir output/Qwen3-8B \
  --output-path models/Qwen3-8B-PARO

Docker Images

Image	Purpose
`ghcr.io/z-lab/paroquant:chat`	Interactive chat
`ghcr.io/z-lab/paroquant:chat-cu129`	Interactive chat (CUDA 12.9)
`ghcr.io/z-lab/paroquant:serve`	OpenAI-compatible API server
`ghcr.io/z-lab/paroquant:latest`	Optimization & evaluation
`ghcr.io/z-lab/paroquant:eval`	Reasoning task evaluation

Citation

@inproceedings{liang2026paroquant,
  title     = {{ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference}},
  author    = {Liang, Yesheng and Chen, Haisheng and Zhang, Zihan and Han, Song and Liu, Zhijian},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year      = {2026}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.15

May 15, 2026

This version

0.1.14

May 7, 2026

0.1.13

Apr 23, 2026

0.1.12

Apr 6, 2026

0.1.11

Apr 4, 2026

0.1.10

Mar 17, 2026

0.1.9

Mar 17, 2026

0.1.8

Mar 13, 2026

0.1.7

Mar 12, 2026

0.1.6

Mar 10, 2026

0.1.5

Mar 9, 2026

0.1.4

Mar 8, 2026

0.1.3

Mar 6, 2026

0.1.2

Mar 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paroquant-0.1.14.tar.gz (52.4 kB view details)

Uploaded May 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

paroquant-0.1.14-py3-none-any.whl (62.9 kB view details)

Uploaded May 7, 2026 Python 3

File details

Details for the file paroquant-0.1.14.tar.gz.

File metadata

Download URL: paroquant-0.1.14.tar.gz
Upload date: May 7, 2026
Size: 52.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for paroquant-0.1.14.tar.gz
Algorithm	Hash digest
SHA256	`63d89c32396c2fd6fa2badcba847c15c0ec271e4e820241f8d4f8574edaf03e7`
MD5	`2f92107dbfdf5b98d01a79b78974afad`
BLAKE2b-256	`a584ffd00ca8cf9c1b823ee9d1fe565aa75dcdcc1391e0c1bdb73188bbca4cfa`

See more details on using hashes here.

Provenance

The following attestation bundles were made for paroquant-0.1.14.tar.gz:

Publisher: publish-to-pypi.yml on z-lab/paroquant

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: paroquant-0.1.14.tar.gz
- Subject digest: 63d89c32396c2fd6fa2badcba847c15c0ec271e4e820241f8d4f8574edaf03e7
- Sigstore transparency entry: 1463449224
- Sigstore integration time: May 7, 2026
Source repository:
- Permalink: z-lab/paroquant@9cf7c8d5499d54a14c3f1c2d007ae7c4263641d0
- Branch / Tag: refs/tags/v0.1.14
- Owner: https://github.com/z-lab
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-to-pypi.yml@9cf7c8d5499d54a14c3f1c2d007ae7c4263641d0
- Trigger Event: push

File details

Details for the file paroquant-0.1.14-py3-none-any.whl.

File metadata

Download URL: paroquant-0.1.14-py3-none-any.whl
Upload date: May 7, 2026
Size: 62.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for paroquant-0.1.14-py3-none-any.whl
Algorithm	Hash digest
SHA256	`96ba6ec97a3eb00a2f37aee90b3a912b1febf38ac3777a40fa69e59eb874c99a`
MD5	`0e08ea65605613e1bf8b47af0bf8a76d`
BLAKE2b-256	`b8c7ac69eedbb31c278e6c152c1f83719c54d23c0d234244d64d18ae504c78d8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for paroquant-0.1.14-py3-none-any.whl:

Publisher: publish-to-pypi.yml on z-lab/paroquant

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: paroquant-0.1.14-py3-none-any.whl
- Subject digest: 96ba6ec97a3eb00a2f37aee90b3a912b1febf38ac3777a40fa69e59eb874c99a
- Sigstore transparency entry: 1463449240
- Sigstore integration time: May 7, 2026
Source repository:
- Permalink: z-lab/paroquant@9cf7c8d5499d54a14c3f1c2d007ae7c4263641d0
- Branch / Tag: refs/tags/v0.1.14
- Owner: https://github.com/z-lab
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-to-pypi.yml@9cf7c8d5499d54a14c3f1c2d007ae7c4263641d0
- Trigger Event: push

paroquant 0.1.14

Navigation

Verified details

Owner

Unverified details

Project links

Meta

Classifiers

Project description

ParoQuant

Quick Start

Installation

Interactive Chat

OpenAI-Compatible API Server

Docker (NVIDIA GPU)

Models

Reproduction

Quantize Your Own Model

Docker Images

Citation

Project details

Verified details

Owner

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance