Skip to main content

Simple, reliable model quantization with minimal dependencies

Project description

Mono Quant

Ultra-lightweight, model-agnostic quantization for PyTorch

PyPI Version Python Version License Documentation

What is Mono Quant?

Mono Quant is a simple, reliable model quantization package for PyTorch with minimal dependencies. Just torch and numpy, no bloat.

Key Features

  • Model-Agnostic - Works with any PyTorch model: HuggingFace, local, or custom
  • Multiple Modes - INT8, INT4, and FP16 quantization
  • Flexible Calibration - Dynamic (no data) or static (with calibration data)
  • Robust Validation - SQNR metrics, size comparison, and accuracy warnings
  • Dual Interface - Python API for automation, CLI for CI/CD
  • Build-Phase Only - Quantize during build, deploy lightweight models

Installation

pip install mono-quant

Requirements

  • Python 3.11 or higher
  • PyTorch 2.0 or higher
  • NumPy 1.24 or higher

Quick Start

Python API

from mono_quant import quantize

# Dynamic INT8 quantization (no calibration data needed)
result = quantize(model, bits=8, dynamic=True)

# Save the quantized model
result.save("model_quantized.pt")

# Check metrics
print(f"Compression: {result.info.compression_ratio:.2f}x")
print(f"SQNR: {result.info.sqnr_db:.2f} dB")

CLI

# Dynamic quantization
monoquant quantize --model model.pt --bits 8 --dynamic

# With custom output path
monoquant quantize --model model.pt --bits 8 --output model_quantized.pt

Quantization Modes

Dynamic Quantization (Fastest, No Data)

result = quantize(model, bits=8, dynamic=True)

Static Quantization (Best Accuracy, Requires Data)

calibration_data = [torch.randn(1, 3, 224, 224) for _ in range(150)]

result = quantize(
    model,
    bits=8,
    dynamic=False,
    calibration_data=calibration_data
)

INT4 Quantization (Maximum Compression)

result = quantize(
    model,
    bits=4,
    dynamic=False,
    calibration_data=calibration_data,
    group_size=128  # Default
)

Documentation

Full documentation available at https://thataverageguy.github.io/mono-quant

Why Mono Quant?

Most quantization tools are tied to specific frameworks (HuggingFace, TFLite) or require heavy dependencies. Mono Quant fills the niche of "just quantize the weights, nothing else."

Design Philosophy

Aspect Approach
Model Loading You load the model, we quantize it
Dependencies Only torch and numpy required
Use Case Build-phase (CI/CD, local development)
Scope Quantization only, no runtime or serving

License

MIT License - see LICENSE for details.

Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mono_quant-1.0.1.tar.gz (57.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mono_quant-1.0.1-py3-none-any.whl (67.3 kB view details)

Uploaded Python 3

File details

Details for the file mono_quant-1.0.1.tar.gz.

File metadata

  • Download URL: mono_quant-1.0.1.tar.gz
  • Upload date:
  • Size: 57.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.5 requests-toolbelt/1.0.0 urllib3/2.4.0 tqdm/4.67.1 importlib-metadata/8.7.0 keyring/25.6.0 rfc3986/1.5.0 colorama/0.4.6 CPython/3.12.8

File hashes

Hashes for mono_quant-1.0.1.tar.gz
Algorithm Hash digest
SHA256 63f3204d0f3925b84443ed1d90102f6d0f7093090e12c4944e3da303b3b59687
MD5 8156a28090f19e2e2ea7696e43cf9540
BLAKE2b-256 d42f6e18beb1ab2f445f6477014ad51ea53f886a1e98357606bf2c1f77e314c5

See more details on using hashes here.

File details

Details for the file mono_quant-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: mono_quant-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 67.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.5 requests-toolbelt/1.0.0 urllib3/2.4.0 tqdm/4.67.1 importlib-metadata/8.7.0 keyring/25.6.0 rfc3986/1.5.0 colorama/0.4.6 CPython/3.12.8

File hashes

Hashes for mono_quant-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d0a004ae7737049d8820d9ee9bd4ee3d3ddeb9bda83c3d43e79967c7526aefdd
MD5 11f94a49f392e62315b0c294e50384a4
BLAKE2b-256 49a89e55e9579890d83683fffbe0033b59a1b51e12d80a5d96187874a019c2ad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page