Production-grade neural-network quantization framework with NSGA + ONNX + hardware-aware search

These details have not been verified by PyPI

Project links

Project description

NeuroQuant v2.0

Production-grade neural-network quantization framework with multi-objective NSGA search, ONNX deployment, and hardware-aware optimisation.

NeuroQuant takes a pre-trained PyTorch model and produces deployable INT8 / mixed-precision artefacts that have been measured (not estimated) on the same runtime that ships in production. Every public number is the result of running a real quantized graph through ONNX Runtime — no synthetic shortcuts.

What it does

   ┌────────────────────────────────────────────────────────────────────────┐
   │                                                                        │
   │  FP32 PyTorch model  ─────►  10-phase pipeline  ─────►  INT8 .onnx     │
   │                                                          + metrics     │
   │  ┌──────────────────────────────────────────────────────────────┐     │
   │  │  P0  Prepare model + dataset, FP32 baseline                  │     │
   │  │  P1a Hessian / Fisher per-layer sensitivity                  │     │
   │  │  P1b FITCompress warm-start seed                             │     │
   │  │  P1c NSGA multi-objective search (2- or 3-obj)               │     │
   │  │  P1d AdaRound canonical-order weight rounding                │     │
   │  │  P1e Real W+A QAT with FP32 teacher distillation             │     │
   │  │  P1f GPTQ + SmoothQuant + AWQ + SmoothQuant→GPTQ             │     │
   │  │  P2  Pareto analysis + plots                                 │     │
   │  │  P3  Grad-CAM + SHAP explainability                          │     │
   │  │  P4  MLflow finalisation + reproducibility manifest          │     │
   │  └──────────────────────────────────────────────────────────────┘     │
   │                                                                        │
   └────────────────────────────────────────────────────────────────────────┘

The pipeline runs to completion in ~60 seconds on CPU for a CIFAR-class model.

Why it is production-grade

This framework was built deliberately to avoid the "research prototype" failure modes that disqualify most academic quantization tooling from real deployment:

Concern	What NeuroQuant does
Real INT inference	Wave 4 emits true static-INT8 ONNX graphs via `onnxruntime.quantization.quantize_static`, not FP32 simulation.
Real on-disk size	`model_size_mb` is the literal `.onnx` filesystem size, not `numel × bw / 8`. The synthetic estimate is kept as `theoretical_size_mb` for ablation.
Real latency	`latency_ms` is measured under ONNX Runtime on the same machine that will deploy the artefact.
Hardware-aware search	The NSGA third objective sums a per-layer ORT latency LUT (Wave 4 C2). Every gene's latency cost is a real timing.
No leakage between splits	Train / search / val / test are 80/10/10/test-set; NSGA fitness reads search, QAT early-stop reads val, headline reads test.
Strict determinism	`set_seed(strict=True)` enforces `CUBLAS_WORKSPACE_CONFIG`, `use_deterministic_algorithms`, `cudnn.deterministic`.
Safe checkpoints	All `torch.load(weights_only=True)`; pickle path is closed. Architectural wrappers persist as JSON manifests.
Real W+A QAT	INT8 activations always; weight parametrisation via `torch.nn.utils.parametrize` (autograd-aware STE).
Validated config	Pydantic v2 dataclasses with field validators — bad values fail at load, not deep in a phase.

Install

From the wheel

pip install neuroquant-2.0.0-py3-none-any.whl
neuroquant --help

From source

git clone https://github.com/AbdelazizElHelaly11/NeuroQuant
cd NeuroQuant
pip install -e ".[dev]"        # editable + dev extras

GPU users:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -e ".[dev]"

Run

The console-script neuroquant is installed by the wheel; it accepts the same flags as python main.py.

# Full pipeline on the bundled config (CIFAR-10 + MobileNetV2)
neuroquant --config config.yaml --epochs 20

# Fast smoke (CPU, no training, first three phases)
neuroquant --config config.yaml --epochs 0 --device cpu \
  --phases phase_0_preparation phase_1a_hessian_clustering phase_1b_fitcompress

# Resume after interruption
neuroquant --config config.yaml --epochs 20 --resume

# Hardware-aware mode (3-objective NSGA + ORT latency LUT)
# Set hardware_aware_search: true in config.yaml, then:
neuroquant --config config.yaml --epochs 20

The pipeline writes everything to output_dir (default ./artifacts/):

artifacts/
├── checkpoints/          # per-phase resume points
├── onnx/                 # FP32 + per-method INT8 .onnx files
├── pareto/               # Pareto plots + JSON
├── reports/              # pipeline_report.txt, pareto_summary.json
├── reproducibility_manifest.json
├── latency_lut.json      # only when hardware_aware_search=true
└── pipeline_report.txt

Configuration

All knobs live in config.yaml. Common overrides:

model:
  name: resnet18              # any torchvision name
  num_classes: 10
  input_shape: [3, 32, 32]

dataset:
  name: cifar10               # cifar10 | cifar100 | imagefolder | synthetic | custom
  class: null                 # optional "pkg.module.MyDataset"
  train_dir: null             # optional ImageFolder split dirs
  val_dir: null
  test_dir: null
  batch_size: 128

methods: [ptq, qat, gptq, smoothquant, awq]
bitwidths:
  supported: [4, 8]
  io_layer: 8                 # force first/last layers to INT8

hyperparams:
  hardware_aware_search: true     # Wave 4 J4: 3-obj NSGA
  onnx_export_enabled: true       # Wave 4 J1/J2/J3
  qat_distill_alpha: 0.5          # Wave 2 E5: KD with FP32 teacher
  smoothquant_per_layer_alpha: true  # Wave 3 F3
  hessian_estimator: fisher       # Wave 3 B2: 3× faster than diag

Pydantic field validators run at load time — invalid values surface immediately with the offending field path:

ValueError: Configuration validation failed:
  num_classes must be >= 2.

Architecture

The framework was built in seven waves, each ending with a strict-format report. Per-wave architecture notes live in docs/architecture/:

Wave	Theme	Notes
1	Foundation (security + leakage)	wave1.md
2	Real W+A QAT pipeline	wave2.md
3	Method audits + Fisher	wave3.md
4	ONNX + hardware-aware search	wave4.md
5	Reporting + MLflow	wave5.md
6	Config validation (Pydantic)	wave6.md
7	Packaging + docs	wave7.md

Quantization methods

Method	When to use	Module
PTQ	Fast baseline; INT8 with bitwidth-aware calibration.	`quantization/ptq.py`
QAT	Best accuracy at INT8; requires fine-tuning data.	`quantization/qat.py`
GPTQ	Best accuracy at INT4 weights; data-aware optimal rounding.	`quantization/gptq.py`
SmoothQuant	Activation-friendly INT8; per-layer α grid search.	`quantization/smoothquant.py`
AWQ	INT4 with salient-channel preservation; per-layer α + FP16 carve-out.	`quantization/awq.py`
SmoothQuant→GPTQ	Production recipe — strict-Pareto improvement over either method alone.	`quantization/smoothquant_gptq.py`
AdaRound	Post-PTQ refinement; canonical input→output traversal.	`quantization/adaround.py`

License

MIT. See LICENSE for the full text.

Acknowledgements

The seven-wave production hardening was specified, implemented, and refined in collaboration with Claude Opus 4.7 (1M context). Per-wave architecture notes live under docs/architecture/.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.0.0

May 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neuroquant-2.0.0.tar.gz (222.1 kB view details)

Uploaded May 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

neuroquant-2.0.0-py3-none-any.whl (238.3 kB view details)

Uploaded May 13, 2026 Python 3

File details

Details for the file neuroquant-2.0.0.tar.gz.

File metadata

Download URL: neuroquant-2.0.0.tar.gz
Upload date: May 13, 2026
Size: 222.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for neuroquant-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`ddc636645d3ed6e641ed84d5cec1eb39a7663c110f870d5f9a30da31ea4f2387`
MD5	`817ceae052e364cdb71e99ce9a4a2f9b`
BLAKE2b-256	`34e99bef2e7b218d0d0618f3212aeb51c2446fca27b97dc3edc9f9c7b0779251`

See more details on using hashes here.

File details

Details for the file neuroquant-2.0.0-py3-none-any.whl.

File metadata

Download URL: neuroquant-2.0.0-py3-none-any.whl
Upload date: May 13, 2026
Size: 238.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for neuroquant-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`57ac5a6d5abe4ca360346a44d2b6909aa077c73c9d4ae45982a8feb912dd5d01`
MD5	`712a0f25e7289439a4698fedb7eed9a7`
BLAKE2b-256	`8720f2396524f46872731efef5d4575e6c3e6b11f56d16ef135df747d71f46be`

See more details on using hashes here.

neuroquant 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

NeuroQuant v2.0

What it does

Why it is production-grade

Install

From the wheel

From source

Run

Configuration

Architecture

Quantization methods

License

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes