Python package for LLM compression

These details have not been verified by PyPI

Project description

Fujitsu One Compression

Fujitsu One Compression (OneComp) is a Python package for LLM compression.

OneComp

⚡ Just one line.

onecomp <generative AI>

That's all you need. OneComp detects your GPU VRAM, picks the best bit-width per layer, quantizes with error propagation, evaluates, and saves — fully automatic.

# Example
onecomp meta-llama/Llama-2-7b-hf

Or from Python:

from onecomp import Runner

Runner.auto_run(model_id="meta-llama/Llama-2-7b-hf")

📖 Documentation

Full documentation is available at https://FujitsuResearch.github.io/OneCompression/.

📦 Features

Quantization Error Propagation (QEP): A post-training quantization method that corrects quantization errors by propagating them to subsequent layers, improving the accuracy of quantized LLMs. See Arai & Ichikawa, NeurIPS 2025 for details. The original reference implementation is available at FujitsuResearch/qep.
Layer-Projected Coordinate Descent (LPCD): A unified Post Training Quantization (PTQ) framework that extends layer-wise quantization to arbitrary submodules by optimising relaxed objectives and projecting the solutions with layer-wise quantizers. See Ichikawa et al., 2025 for details.
vLLM Plugin Integration: Serve OneComp-quantized models with vLLM via built-in plugins for DBF and Mixed-GPTQ quantization methods. Pair with Open WebUI for a ChatGPT-like chat experience on your local machine.
AutoBit: Mixed-precision quantization with ILP-based bitwidth assignment. Automatically estimates the target bitwidth from available VRAM and assigns per-layer bitwidths to minimize quantization error under the memory budget.
JointQ: Joint quantization method that optimizes weight assignments and scale parameters simultaneously for improved quantization accuracy. Supports group-wise quantization (e.g., 4-bit, groupsize=128).
Block-wise PTQ: Post-quantization block-wise distillation that minimises intermediate-representation MSE against an FP16 teacher model at Transformer-block granularity. Includes Phase 1 (greedy per-block optimisation) and Phase 2 CBQ (cross-block sliding-window optimisation). Supports GPTQ, DBF, and OneBit quantizers.
LoRA SFT Post-Process: Fine-tune quantized models with LoRA adapters for accuracy recovery or domain-specific knowledge injection. Supports SFT loss, teacher distillation, and intermediate block alignment.
Rotation Preprocessing: SpinQuant/OstQuant-based rotation preprocessing that reduces quantization error by learning optimal rotation matrices before quantization. Rotation/scaling matrices are absorbed into model weights, with online Hadamard hooks automatically registered at load time. Supports Llama and Qwen3 architectures.
(TBD)

🤖 Supported Models

OneComp has been verified with the following model architectures. Other Hugging Face-compatible models may work but are currently untested.

#	Architecture	Verified Models	Status
1	Llama	TinyLlama, Llama-2, Llama-3	✅ Verified
2	Qwen3	Qwen3-0.6B ~ 32B	✅ Verified
3	Gemma	Gemma 2, Gemma 3, Gemma 4	✅ Verified

Note: Support for additional architectures is planned. Contributions and test reports are welcome.

🔧 Installation

for users (pip)

1. Install PyTorch

Please install the appropriate version of PyTorch.

✅ CPU-only

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

✅ CUDA-enabled

Choose the appropriate CUDA version for your system:

CUDA Version	Installation Command
CUDA 11.8	`pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118`
CUDA 12.1	`pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121`
CUDA 12.4	`pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124`
CUDA 12.6	`pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126`
CUDA 12.8	`pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128`
CUDA 13.0	`pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130`

Check your CUDA version:

nvcc --version

nvidia-smi

Verify PyTorch GPU support:

import torch
print(torch.cuda.is_available())

2. Install `onecomp`

Once PyTorch is installed, you can install onecomp:

pip install onecomp

for developers (uv : recommended)

Install `uv`

uv is a fast Python package and project manager written in Rust. It offers a drop-in replacement for pip and pip-tools while also managing virtual environments and Python installations. With its Rust-based dependency resolver and the uv.lock lockfile, uv provides deterministic and reproducible environments across development machines and CI pipelines.

# install uv (for macOS or Linux)
curl -LsSf https://astral.sh/uv/install.sh | sh

git clone https://github.com/FujitsuResearch/OneCompression.git
cd OneCompression
uv sync --extra cu128 --extra dev --extra visualize

The uv sync command creates a Python virtual environment and installs all dependent libraries.

The --extra cu128 option installs the CUDA-enabled version of PyTorch (along with torchvision from the same CUDA index). Replace cu128 with the appropriate variant for your environment: cpu, cu118, cu121, cu124, cu126, cu128, or cu130. PyTorch will be automatically downloaded by uv, so you do not need to install it beforehand.

Adding --extra dev installs development tools (black, pytest, pylint). Adding --extra visualize installs matplotlib for visualization features. Adding --extra hydra installs hydra-core for the example scripts and model_validation/ runners that use Hydra-based configuration.

To use vLLM for serving quantized models, add --extra vllm together with --extra cu130:

uv sync --extra cu130 --extra dev --extra visualize --extra vllm

Note: --extra vllm is only compatible with --extra cu130. Recent vLLM releases require torch>=2.10, whose wheels are only published for the cu130 index. Combining --extra vllm with cpu / cu118 / cu121 / cu124 / cu126 / cu128 is rejected by uv at lock time.

Note: --extra vllm may take a long time on the first run if a pre-built xformers wheel is not available for your Python/CUDA combination (e.g. Python 3.13). Using Python 3.12 typically avoids this.

Running commands (uv environment)

In the environment created by uv sync, you can run commands in two ways:

Option 1: Use `uv run` (no activation needed)

uv run pytest tests/ -v
uv run python example/example_gptq.py
uv run black --check onecomp/

Option 2: Activate the virtual environment (traditional approach)

source .venv/bin/activate
pytest tests/ -v
python example/example_gptq.py
black --check onecomp/

for developers (pip)

git clone <git repository URL>
cd OneCompression

# First, install PyTorch with CUDA support for your environment
pip install torch --index-url https://download.pytorch.org/whl/cu128
# Then install onecomp with development dependencies
pip install -e ".[dev]"

Replace cu128 with the appropriate variant for your environment: cpu, cu118, cu121, cu124, cu126, cu128, or cu130.

Building Documentation Locally

uv sync --extra cu128 --extra dev --extra docs
uv run mkdocs serve

Then open http://127.0.0.1:8000 in your browser.

🚀 Examples

Category	Script	Description
Quantization	example_gptq.py	GPTQ quantization
	example_qep_gptq.py	GPTQ + QEP (error propagation)
	example_lpcd_gptq.py	GPTQ + QEP + LPCD quantization
	example_jointq.py	JointQ quantization
	example_autobit.py	AutoBit mixed-precision quantization
	example_auto_run.py	AutoBit with automatic VRAM estimation
Calibration	example_custom_calibration.py	Custom calibration dataset with CalibrationConfig
Save / Load	example_save_load.py	Save and load quantized models
Rotation Preprocessing	example_llama_preprocess_rtn.py	Rotation preprocessing + RTN (TinyLlama)
	example_preprocess_save_load.py	Save and load rotation-preprocessed quantized models
Post-Process	example_blockwise_ptq.py	Block-wise PTQ (GPTQ + Phase 1 & CBQ)
	example_lora_sft.py	LoRA SFT post-quantization fine-tuning
	example_lora_sft_knowledge.py	LoRA SFT knowledge injection
vLLM	example_gptq_vllm_inference.py	GPTQ + QEP quantization and vLLM inference
	example_autobit_vllm_inference.py	AutoBit quantization and vLLM inference

🔌 vLLM Inference

OneComp-quantized models can be served with vLLM via built-in plugins (DBF, Mixed-GPTQ). Combined with Open WebUI, you can chat with your quantized model through a ChatGPT-like browser interface — entirely on your local machine.

# uv users (vLLM requires cu130; see Installation for details)
uv sync --extra cu130 --extra vllm

# pip users
pip install vllm

See the vLLM Inference guide for details, including Open WebUI setup instructions.

📄 License

See LICENSE for more details.

Citation

OneComp technical report:

@misc{ichikawa2026onecomponelinerevolutiongenerative,
      title={OneComp: One-Line Revolution for Generative AI Model Compression}, 
      author={Yuma Ichikawa and Keiji Kimura and Akihiro Yoshida and Yudai Fujimoto and Hiroki Tokura and Yamato Arai and Yoshiyuki Ishii and Yusei Kawakami and Genki Shikada and Achille Jacquemond and Yoshihiko Fujisawa and Katsuki Fujisawa and Takumi Honda and Akira Sakai},
      year={2026},
      eprint={2603.28845},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2603.28845}, 
}

QEP (Quantization Error Propagation):

@inproceedings{
arai2025quantization,
title={Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization},
author={Yamato Arai and Yuma Ichikawa},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=a3l3K9khbL}
}

LPCD (Layer-Projected Coordinate Descent):

@article{ichikawa2025lpcd,
title={LPCD: Unified Framework from Layer-Wise to Submodule Quantization},
author={Yuma Ichikawa and Yudai Fujimoto and Akira Sakai},
journal={arXiv preprint arXiv:2512.01546},
year={2025},
url={https://arxiv.org/abs/2512.01546}
}

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.1.0

Apr 28, 2026

1.0.2

Mar 31, 2026

1.0.1

Mar 31, 2026

1.0.0

Mar 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

onecomp-1.1.0.tar.gz (350.7 kB view details)

Uploaded Apr 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

onecomp-1.1.0-py3-none-any.whl (453.6 kB view details)

Uploaded Apr 28, 2026 Python 3

File details

Details for the file onecomp-1.1.0.tar.gz.

File metadata

Download URL: onecomp-1.1.0.tar.gz
Upload date: Apr 28, 2026
Size: 350.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for onecomp-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`c6c8d500b58c247b11624c6c1c937c75327c11708b896dbc08deda770a7ca1d2`
MD5	`f2699be573729642af16f9f3943deda9`
BLAKE2b-256	`e66071002e6191f36993cfe3410f7460fb22753e87379a3a92a275c3df30c2dd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for onecomp-1.1.0.tar.gz:

Publisher: publish.yml on FujitsuResearch/OneCompression

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: onecomp-1.1.0.tar.gz
- Subject digest: c6c8d500b58c247b11624c6c1c937c75327c11708b896dbc08deda770a7ca1d2
- Sigstore transparency entry: 1397676391
- Sigstore integration time: Apr 28, 2026
Source repository:
- Permalink: FujitsuResearch/OneCompression@6526c0feca27d26be8d4815a36a05713d3fdc2bc
- Branch / Tag: refs/tags/v1.1.0
- Owner: https://github.com/FujitsuResearch
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6526c0feca27d26be8d4815a36a05713d3fdc2bc
- Trigger Event: release

File details

Details for the file onecomp-1.1.0-py3-none-any.whl.

File metadata

Download URL: onecomp-1.1.0-py3-none-any.whl
Upload date: Apr 28, 2026
Size: 453.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for onecomp-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`810b223abefed51d6658a9096ddb9fbf08cc010110efe04c2cd3584b09904274`
MD5	`88b9b1bf9ea5ca6ca6a2ec96716c4c74`
BLAKE2b-256	`6d2eb516e0d68375905964c4f1f7bc8a947f9bcac34f791fd362c26762e2bb25`

See more details on using hashes here.

Provenance

The following attestation bundles were made for onecomp-1.1.0-py3-none-any.whl:

Publisher: publish.yml on FujitsuResearch/OneCompression

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: onecomp-1.1.0-py3-none-any.whl
- Subject digest: 810b223abefed51d6658a9096ddb9fbf08cc010110efe04c2cd3584b09904274
- Sigstore transparency entry: 1397676393
- Sigstore integration time: Apr 28, 2026
Source repository:
- Permalink: FujitsuResearch/OneCompression@6526c0feca27d26be8d4815a36a05713d3fdc2bc
- Branch / Tag: refs/tags/v1.1.0
- Owner: https://github.com/FujitsuResearch
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6526c0feca27d26be8d4815a36a05713d3fdc2bc
- Trigger Event: release

onecomp 1.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Fujitsu One Compression

⚡ Just one line.

📖 Documentation

📦 Features

🤖 Supported Models

🔧 Installation

for users (pip)

1. Install PyTorch

✅ CPU-only

✅ CUDA-enabled

2. Install onecomp

for developers (uv : recommended)

Install uv

Running commands (uv environment)

Option 1: Use uv run (no activation needed)

Option 2: Activate the virtual environment (traditional approach)

for developers (pip)

Building Documentation Locally

🚀 Examples

🔌 vLLM Inference

📄 License

Citation

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

2. Install `onecomp`

Install `uv`

Option 1: Use `uv run` (no activation needed)