Simple, reliable model quantization with minimal dependencies
Project description
Mono Quant
Ultra-lightweight, model-agnostic quantization for PyTorch
What is Mono Quant?
Mono Quant is a simple, reliable model quantization package for PyTorch with minimal dependencies. Just torch and numpy, no bloat.
Key Features
- Model-Agnostic - Works with any PyTorch model: HuggingFace, local, or custom
- Multiple Modes - INT8, INT4, and FP16 quantization
- Flexible Calibration - Dynamic (no data) or static (with calibration data)
- Robust Validation - SQNR metrics, size comparison, and accuracy warnings
- Dual Interface - Python API for automation, CLI for CI/CD
- Build-Phase Only - Quantize during build, deploy lightweight models
Installation
pip install mono-quant
Requirements
- Python 3.11 or higher
- PyTorch 2.0 or higher
- NumPy 1.24 or higher
Quick Start
Python API
from mono_quant import quantize
# Dynamic INT8 quantization (no calibration data needed)
result = quantize(model, bits=8, dynamic=True)
# Save the quantized model
result.save("model_quantized.pt")
# Check metrics
print(f"Compression: {result.info.compression_ratio:.2f}x")
print(f"SQNR: {result.info.sqnr_db:.2f} dB")
CLI
# Dynamic quantization
monoquant quantize --model model.pt --bits 8 --dynamic
# With custom output path
monoquant quantize --model model.pt --bits 8 --output model_quantized.pt
Quantization Modes
Dynamic Quantization (Fastest, No Data)
result = quantize(model, bits=8, dynamic=True)
Static Quantization (Best Accuracy, Requires Data)
calibration_data = [torch.randn(1, 3, 224, 224) for _ in range(150)]
result = quantize(
model,
bits=8,
dynamic=False,
calibration_data=calibration_data
)
INT4 Quantization (Maximum Compression)
result = quantize(
model,
bits=4,
dynamic=False,
calibration_data=calibration_data,
group_size=128 # Default
)
Documentation
Full documentation available at https://thataverageguy.github.io/mono-quant
Why Mono Quant?
Most quantization tools are tied to specific frameworks (HuggingFace, TFLite) or require heavy dependencies. Mono Quant fills the niche of "just quantize the weights, nothing else."
Design Philosophy
| Aspect | Approach |
|---|---|
| Model Loading | You load the model, we quantize it |
| Dependencies | Only torch and numpy required |
| Use Case | Build-phase (CI/CD, local development) |
| Scope | Quantization only, no runtime or serving |
License
MIT License - see LICENSE for details.
Contributing
Contributions welcome! Please see CONTRIBUTING.md for guidelines.
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mono_quant-1.1.tar.gz.
File metadata
- Download URL: mono_quant-1.1.tar.gz
- Upload date:
- Size: 64.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bba481db3e0d7b11d5db3ce8b3da3c7125e458b6921229e0ad8bfd02e2dad1c1
|
|
| MD5 |
8dcca14bb064eeb90a6dbf0b5453a493
|
|
| BLAKE2b-256 |
5ece5eb103cf92188c9d1276789a2e148ecae21bec44982f62b022c4a91998cf
|
File details
Details for the file mono_quant-1.1-py3-none-any.whl.
File metadata
- Download URL: mono_quant-1.1-py3-none-any.whl
- Upload date:
- Size: 74.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2279c98645fcaa03211369b8d65dcc5727a7f2df604ed0c8cdf8914f77785ff
|
|
| MD5 |
ffb7f077ec446277c5ab0d34f95ac9cd
|
|
| BLAKE2b-256 |
da29d898c39b62f7a4f0afd3963956dd42b88fabec35d2d0db99d07feee8ab49
|