Skip to main content

Simple, reliable model quantization with minimal dependencies

Project description

Mono Quant

Ultra-lightweight, model-agnostic quantization for PyTorch

PyPI Version Python Version License Documentation CI/CD

What is Mono Quant?

Mono Quant is a simple, reliable model quantization package for PyTorch with minimal dependencies. Just torch and numpy, no bloat.

Key Features

  • Model-Agnostic - Works with any PyTorch model: HuggingFace, local, or custom
  • Multiple Modes - INT8, INT4, and FP16 quantization
  • Flexible Calibration - Dynamic (no data) or static (with calibration data)
  • Robust Validation - SQNR metrics, size comparison, and accuracy warnings
  • Dual Interface - Python API for automation, CLI for CI/CD
  • Build-Phase Only - Quantize during build, deploy lightweight models

Installation

pip install mono-quant

Requirements

  • Python 3.11 or higher
  • PyTorch 2.0 or higher
  • NumPy 1.24 or higher

Quick Start

Python API

from mono_quant import quantize

# Dynamic INT8 quantization (no calibration data needed)
result = quantize(model, bits=8, dynamic=True)

# Save the quantized model
result.save("model_quantized.pt")

# Check metrics
print(f"Compression: {result.info.compression_ratio:.2f}x")
print(f"SQNR: {result.info.sqnr_db:.2f} dB")

CLI

# Dynamic quantization
monoquant quantize --model model.pt --bits 8 --dynamic

# With custom output path
monoquant quantize --model model.pt --bits 8 --output model_quantized.pt

Quantization Modes

Dynamic Quantization (Fastest, No Data)

result = quantize(model, bits=8, dynamic=True)

Static Quantization (Best Accuracy, Requires Data)

calibration_data = [torch.randn(1, 3, 224, 224) for _ in range(150)]

result = quantize(
    model,
    bits=8,
    dynamic=False,
    calibration_data=calibration_data
)

INT4 Quantization (Maximum Compression)

result = quantize(
    model,
    bits=4,
    dynamic=False,
    calibration_data=calibration_data,
    group_size=128  # Default
)

Documentation

Full documentation available at https://thataverageguy.github.io/mono-quant

Why Mono Quant?

Most quantization tools are tied to specific frameworks (HuggingFace, TFLite) or require heavy dependencies. Mono Quant fills the niche of "just quantize the weights, nothing else."

Design Philosophy

Aspect Approach
Model Loading You load the model, we quantize it
Dependencies Only torch and numpy required
Use Case Build-phase (CI/CD, local development)
Scope Quantization only, no runtime or serving

License

MIT License - see LICENSE for details.

Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mono_quant-1.1.tar.gz (64.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mono_quant-1.1-py3-none-any.whl (74.6 kB view details)

Uploaded Python 3

File details

Details for the file mono_quant-1.1.tar.gz.

File metadata

  • Download URL: mono_quant-1.1.tar.gz
  • Upload date:
  • Size: 64.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mono_quant-1.1.tar.gz
Algorithm Hash digest
SHA256 bba481db3e0d7b11d5db3ce8b3da3c7125e458b6921229e0ad8bfd02e2dad1c1
MD5 8dcca14bb064eeb90a6dbf0b5453a493
BLAKE2b-256 5ece5eb103cf92188c9d1276789a2e148ecae21bec44982f62b022c4a91998cf

See more details on using hashes here.

File details

Details for the file mono_quant-1.1-py3-none-any.whl.

File metadata

  • Download URL: mono_quant-1.1-py3-none-any.whl
  • Upload date:
  • Size: 74.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mono_quant-1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d2279c98645fcaa03211369b8d65dcc5727a7f2df604ed0c8cdf8914f77785ff
MD5 ffb7f077ec446277c5ab0d34f95ac9cd
BLAKE2b-256 da29d898c39b62f7a4f0afd3963956dd42b88fabec35d2d0db99d07feee8ab49

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page