Python package for LLM compression
Project description
Fujitsu One Compression
Fujitsu One Compression (OneComp) is a Python package for LLM compression.
📖 Documentation
Full documentation is available at https://FujitsuResearch.github.io/OneCompression/.
📦 Features
- Quantization Error Propagation (QEP): A post-training quantization method that corrects quantization errors by propagating them to subsequent layers, improving the accuracy of quantized LLMs. See Arai & Ichikawa, NeurIPS 2025 for details. The original reference implementation is available at FujitsuResearch/qep.
- vLLM Plugin Integration: Serve OneComp-quantized models with vLLM via built-in plugins for DBF and Mixed-GPTQ quantization methods.
- AutoBit: Mixed-precision quantization with ILP-based bitwidth assignment. Automatically estimates the target bitwidth from available VRAM and assigns per-layer bitwidths to minimize quantization error under the memory budget.
- JointQ: Joint quantization method that optimizes weight assignments and scale parameters simultaneously for improved quantization accuracy. Supports group-wise quantization (e.g., 4-bit, groupsize=128).
- LoRA SFT Post-Process: Fine-tune quantized models with LoRA adapters for accuracy recovery or domain-specific knowledge injection. Supports SFT loss, teacher distillation, and intermediate block alignment.
- Rotation Preprocessing: SpinQuant/OstQuant-based rotation preprocessing that reduces quantization error by learning optimal rotation matrices before quantization. Rotation/scaling matrices are absorbed into model weights, with online Hadamard hooks automatically registered at load time. Supports Llama and Qwen3 architectures.
- (TBD)
🤖 Supported Models
OneComp has been verified with the following model architectures. Other Hugging Face-compatible models may work but are currently untested.
| # | Architecture | Verified Models | Status |
|---|---|---|---|
| 1 | Llama | TinyLlama, Llama-2, Llama-3 | ✅ Verified |
| 2 | Qwen3 | Qwen3-0.6B ~ 32B | ✅ Verified |
Note: Support for additional architectures is planned. Contributions and test reports are welcome.
🔧 Installation
for users (pip)
1. Install PyTorch
Please install the appropriate version of PyTorch.
✅ CPU-only
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
✅ CUDA-enabled
Choose the appropriate CUDA version for your system:
| CUDA Version | Installation Command |
|---|---|
| CUDA 11.8 | pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 |
| CUDA 12.1 | pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 |
| CUDA 12.4 | pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124 |
| CUDA 12.6 | pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126 |
| CUDA 12.8 | pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128 |
Check your CUDA version:
nvcc --version
or
nvidia-smi
Verify PyTorch GPU support:
import torch
print(torch.cuda.is_available())
2. Install onecomp
Once PyTorch is installed, you can install onecomp:
pip install onecomp
for developers (uv : recommended)
Install uv
uv is a fast Python package and project manager written in Rust.
It offers a drop-in replacement for pip and pip-tools while also managing virtual environments and Python installations.
With its Rust-based dependency resolver and the uv.lock lockfile, uv provides deterministic and reproducible environments across development machines and CI pipelines.
# install uv (for macOS or Linux)
curl -LsSf https://astral.sh/uv/install.sh | sh
git clone <git repository URL>
cd OneCompression
uv sync --extra cu128 --extra dev
The uv sync command creates a Python virtual environment and installs all dependent libraries.
The --extra cu128 option installs the CUDA-enabled version of PyTorch (along with torchvision from the same CUDA index).
Replace cu128 with the appropriate variant for your environment: cpu, cu118, cu121, cu124, cu126, or cu128.
PyTorch will be automatically downloaded by uv, so you do not need to install it beforehand.
Adding --extra dev installs additional packages for development.
To use vLLM for serving quantized models, add --extra vllm:
uv sync --extra cu128 --extra dev --extra vllm
Note:
--extra vllmmay take a long time on the first run if a pre-builtxformerswheel is not available for your Python/CUDA combination (e.g. Python 3.13). Using Python 3.12 typically avoids this.
Running commands (uv environment)
In the environment created by uv sync, you can run commands in two ways:
Option 1: Use uv run (no activation needed)
uv run pytest tests/ -v
uv run python example/example1.py
uv run black --check onecomp/
Option 2: Activate the virtual environment (traditional approach)
source .venv/bin/activate
pytest tests/ -v
python example/example1.py
black --check onecomp/
for developers (pip)
git clone <git repository URL>
cd OneCompression
# First, install PyTorch with CUDA support for your environment
pip install torch --index-url https://download.pytorch.org/whl/cu128
# Then install onecomp with development dependencies
pip install -e ".[dev]"
Replace cu128 with the appropriate variant for your environment: cpu, cu118, cu121, cu124, cu126, or cu128.
Building Documentation Locally
uv sync --extra cu128 --extra dev --extra docs
uv run mkdocs serve
Then open http://127.0.0.1:8000 in your browser.
🚀 Examples
| Category | Script | Description |
|---|---|---|
| Quantization | example_gptq.py | GPTQ quantization |
| example_qep_gptq.py | GPTQ + QEP (error propagation) | |
| example_jointq.py | JointQ quantization | |
| example_autobit.py | AutoBit mixed-precision quantization | |
| example_auto_run.py | AutoBit with automatic VRAM estimation | |
| Save / Load | example_save_load.py | Save and load quantized models |
| Rotation Preprocessing | example_llama_preprocess_rtn.py | Rotation preprocessing + RTN (TinyLlama) |
| example_preprocess_save_load.py | Save and load rotation-preprocessed quantized models | |
| Post-Process | example_lora_sft.py | LoRA SFT post-quantization fine-tuning |
| example_lora_sft_knowledge.py | LoRA SFT knowledge injection | |
| vLLM | example_gptq_vllm_inference.py | GPTQ + QEP quantization and vLLM inference |
| example_autobit_vllm_inference.py | AutoBit quantization and vLLM inference |
🔌 vLLM Inference
OneComp-quantized models can be served with vLLM via built-in plugins (DBF, Mixed-GPTQ).
# uv users
uv sync --extra cu128 --extra vllm
# pip users
pip install vllm
See the vLLM Inference guide for details.
📄 License
See LICENSE for more details.
Citation
OneComp technical report (coming soon on ArXiv):
@misc{onecomp2026,
title={TBD},
author={TBD},
year={2026},
note={arXiv preprint coming soon}
}
QEP (Quantization Error Propagation):
@inproceedings{
arai2025quantization,
title={Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization},
author={Yamato Arai and Yuma Ichikawa},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=a3l3K9khbL}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file onecomp-1.0.0.tar.gz.
File metadata
- Download URL: onecomp-1.0.0.tar.gz
- Upload date:
- Size: 292.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d3c862109a8445f969cc04802fe25cef2e8ab0c8da9ca4c4d6d63b7782775e34
|
|
| MD5 |
de51354ef35571dd8c9475c563bb441a
|
|
| BLAKE2b-256 |
e993433a5d72e45a188e48749b828f1f3375858bd4c2862eb30e1b999dfa2b55
|
Provenance
The following attestation bundles were made for onecomp-1.0.0.tar.gz:
Publisher:
publish.yml on FujitsuResearch/OneCompression
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
onecomp-1.0.0.tar.gz -
Subject digest:
d3c862109a8445f969cc04802fe25cef2e8ab0c8da9ca4c4d6d63b7782775e34 - Sigstore transparency entry: 1203487400
- Sigstore integration time:
-
Permalink:
FujitsuResearch/OneCompression@4f982dff4db4c4fee43fa738bfb209ee29dee2a8 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/FujitsuResearch
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4f982dff4db4c4fee43fa738bfb209ee29dee2a8 -
Trigger Event:
release
-
Statement type:
File details
Details for the file onecomp-1.0.0-py3-none-any.whl.
File metadata
- Download URL: onecomp-1.0.0-py3-none-any.whl
- Upload date:
- Size: 366.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc5736e0bae665808eb53f801c0149b35069824e8776dc16a71035095f1d29f3
|
|
| MD5 |
ca211f4d265511ab539adf689065f354
|
|
| BLAKE2b-256 |
415aa17f05d184a392ac0dbc08cef6260769845373a1f04722adba8fd2599164
|
Provenance
The following attestation bundles were made for onecomp-1.0.0-py3-none-any.whl:
Publisher:
publish.yml on FujitsuResearch/OneCompression
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
onecomp-1.0.0-py3-none-any.whl -
Subject digest:
dc5736e0bae665808eb53f801c0149b35069824e8776dc16a71035095f1d29f3 - Sigstore transparency entry: 1203487403
- Sigstore integration time:
-
Permalink:
FujitsuResearch/OneCompression@4f982dff4db4c4fee43fa738bfb209ee29dee2a8 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/FujitsuResearch
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4f982dff4db4c4fee43fa738bfb209ee29dee2a8 -
Trigger Event:
release
-
Statement type: