Skip to main content

Python package for LLM compression

Project description

Fujitsu One Compression

Fujitsu One Compression (OneComp) is a Python package for LLM compression.

📖 Documentation

Full documentation is available at https://FujitsuResearch.github.io/OneCompression/.

📦 Features

  • Quantization Error Propagation (QEP): A post-training quantization method that corrects quantization errors by propagating them to subsequent layers, improving the accuracy of quantized LLMs. See Arai & Ichikawa, NeurIPS 2025 for details. The original reference implementation is available at FujitsuResearch/qep.
  • vLLM Plugin Integration: Serve OneComp-quantized models with vLLM via built-in plugins for DBF and Mixed-GPTQ quantization methods.
  • AutoBit: Mixed-precision quantization with ILP-based bitwidth assignment. Automatically estimates the target bitwidth from available VRAM and assigns per-layer bitwidths to minimize quantization error under the memory budget.
  • JointQ: Joint quantization method that optimizes weight assignments and scale parameters simultaneously for improved quantization accuracy. Supports group-wise quantization (e.g., 4-bit, groupsize=128).
  • LoRA SFT Post-Process: Fine-tune quantized models with LoRA adapters for accuracy recovery or domain-specific knowledge injection. Supports SFT loss, teacher distillation, and intermediate block alignment.
  • Rotation Preprocessing: SpinQuant/OstQuant-based rotation preprocessing that reduces quantization error by learning optimal rotation matrices before quantization. Rotation/scaling matrices are absorbed into model weights, with online Hadamard hooks automatically registered at load time. Supports Llama and Qwen3 architectures.
  • (TBD)

🤖 Supported Models

OneComp has been verified with the following model architectures. Other Hugging Face-compatible models may work but are currently untested.

# Architecture Verified Models Status
1 Llama TinyLlama, Llama-2, Llama-3 ✅ Verified
2 Qwen3 Qwen3-0.6B ~ 32B ✅ Verified

Note: Support for additional architectures is planned. Contributions and test reports are welcome.

🔧 Installation

for users (pip)

1. Install PyTorch

Please install the appropriate version of PyTorch.

✅ CPU-only

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

✅ CUDA-enabled

Choose the appropriate CUDA version for your system:

CUDA Version Installation Command
CUDA 11.8 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
CUDA 12.1 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
CUDA 12.4 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
CUDA 12.6 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
CUDA 12.8 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

Check your CUDA version:

nvcc --version

or

nvidia-smi

Verify PyTorch GPU support:

import torch
print(torch.cuda.is_available())

2. Install onecomp

Once PyTorch is installed, you can install onecomp:

pip install onecomp

for developers (uv : recommended)

Install uv

uv is a fast Python package and project manager written in Rust. It offers a drop-in replacement for pip and pip-tools while also managing virtual environments and Python installations. With its Rust-based dependency resolver and the uv.lock lockfile, uv provides deterministic and reproducible environments across development machines and CI pipelines.

# install uv (for macOS or Linux)
curl -LsSf https://astral.sh/uv/install.sh | sh

git clone https://github.com/FujitsuResearch/OneCompression.git
cd OneCompression
uv sync --extra cu128 --extra dev --extra visualize

The uv sync command creates a Python virtual environment and installs all dependent libraries.

The --extra cu128 option installs the CUDA-enabled version of PyTorch (along with torchvision from the same CUDA index). Replace cu128 with the appropriate variant for your environment: cpu, cu118, cu121, cu124, cu126, or cu128. PyTorch will be automatically downloaded by uv, so you do not need to install it beforehand.

Adding --extra dev installs development tools (black, pytest, pylint). Adding --extra visualize installs matplotlib for visualization features.

To use vLLM for serving quantized models, add --extra vllm:

uv sync --extra cu128 --extra dev --extra visualize --extra vllm

Note: --extra vllm may take a long time on the first run if a pre-built xformers wheel is not available for your Python/CUDA combination (e.g. Python 3.13). Using Python 3.12 typically avoids this.

Running commands (uv environment)

In the environment created by uv sync, you can run commands in two ways:

Option 1: Use uv run (no activation needed)
uv run pytest tests/ -v
uv run python example/example1.py
uv run black --check onecomp/
Option 2: Activate the virtual environment (traditional approach)
source .venv/bin/activate
pytest tests/ -v
python example/example1.py
black --check onecomp/

for developers (pip)

git clone <git repository URL>
cd OneCompression

# First, install PyTorch with CUDA support for your environment
pip install torch --index-url https://download.pytorch.org/whl/cu128
# Then install onecomp with development dependencies
pip install -e ".[dev]"

Replace cu128 with the appropriate variant for your environment: cpu, cu118, cu121, cu124, cu126, or cu128.

Building Documentation Locally

uv sync --extra cu128 --extra dev --extra docs
uv run mkdocs serve

Then open http://127.0.0.1:8000 in your browser.

🚀 Examples

Category Script Description
Quantization example_gptq.py GPTQ quantization
example_qep_gptq.py GPTQ + QEP (error propagation)
example_jointq.py JointQ quantization
example_autobit.py AutoBit mixed-precision quantization
example_auto_run.py AutoBit with automatic VRAM estimation
Save / Load example_save_load.py Save and load quantized models
Rotation Preprocessing example_llama_preprocess_rtn.py Rotation preprocessing + RTN (TinyLlama)
example_preprocess_save_load.py Save and load rotation-preprocessed quantized models
Post-Process example_lora_sft.py LoRA SFT post-quantization fine-tuning
example_lora_sft_knowledge.py LoRA SFT knowledge injection
vLLM example_gptq_vllm_inference.py GPTQ + QEP quantization and vLLM inference
example_autobit_vllm_inference.py AutoBit quantization and vLLM inference

🔌 vLLM Inference

OneComp-quantized models can be served with vLLM via built-in plugins (DBF, Mixed-GPTQ).

# uv users
uv sync --extra cu128 --extra vllm

# pip users
pip install vllm

See the vLLM Inference guide for details.

📄 License

See LICENSE for more details.

Citation

OneComp technical report (coming soon on ArXiv):

@misc{onecomp2026,
  title={TBD},
  author={TBD},
  year={2026},
  note={arXiv preprint coming soon}
}

QEP (Quantization Error Propagation):

@inproceedings{
arai2025quantization,
title={Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization},
author={Yamato Arai and Yuma Ichikawa},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=a3l3K9khbL}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

onecomp-1.0.2.tar.gz (292.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

onecomp-1.0.2-py3-none-any.whl (366.9 kB view details)

Uploaded Python 3

File details

Details for the file onecomp-1.0.2.tar.gz.

File metadata

  • Download URL: onecomp-1.0.2.tar.gz
  • Upload date:
  • Size: 292.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for onecomp-1.0.2.tar.gz
Algorithm Hash digest
SHA256 28d691b8b6771f8e3cf5f0e8dde55345a54ebcbe0636b6ab1085c92a41cad303
MD5 7df87c6021a30ddaa9cfbb4a3e448481
BLAKE2b-256 9d4e0829ddf05f2330d4e4e95134c739250b4d00351b658355b913f3d3887e98

See more details on using hashes here.

Provenance

The following attestation bundles were made for onecomp-1.0.2.tar.gz:

Publisher: publish.yml on FujitsuResearch/OneCompression

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file onecomp-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: onecomp-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 366.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for onecomp-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6c96e097db1e3e24f0a050a9a69536ad03d2b7bf05f24dc047846a262ad087c3
MD5 30fa57e7c0e5c5b23f02d7e000b1e477
BLAKE2b-256 b56fdac77910482165cdafc3dd58bc74fd32b5a2ff06188f795f0d3d97834d45

See more details on using hashes here.

Provenance

The following attestation bundles were made for onecomp-1.0.2-py3-none-any.whl:

Publisher: publish.yml on FujitsuResearch/OneCompression

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page