Skip to main content

Single-command EXL3 quantization + measurement + reporting pipeline

Project description

ezexl3

ezexl3 is a single-command quantization and measurement pipeline that generates high-quality, HuggingFace-ready exl3 repos automatically.

It wraps the exllamav3 quantization and evaluation workflow into a tool that has:

  • Runs batched quantization (multi-gpu supported)
  • Supports optimized BPWs, (2.1 bpw, 3.5 bpw etc.)
  • Measures KL divergence + PPL @ 200k tokens, recording data to CSV
  • Generates a HuggingFace-ready README.md with your measurements using customizable templates
  • Embeds an SVG graph from the measurement CSV in the README
  • Optional catbench integration — generates SVG kitten drawings at each BPW and assembles them into a grid
  • Checkpoints and resumes intelligently all from one command.

Pipeline:

model → quantize → optimize → measure (KL + PPL + catbench) → graph → README


Installation

This tool requires a local installation of exllamav3.

# 1. Make sure you have exllamav3 installed.

# 2. Clone and install ezexl3
git clone https://github.com/UnstableLlama/ezexl3
cd ezexl3
pip install -e .

Usage

1. Quantize a full repository

Run the entire pipeline (quantize → measure → README):

ezexl3 repo -m /path/to/base_model -b 2,2.5,3,4,5,6 -d 0,1 -t basic

Then ezexl3 automatically:

  • Quantizes the model to all indicated bitrates, saved under subdirectories in the base model folder.

  • Measures PPL and KL div and saves to modelNameMeasured.csv in the base model folder, and makes a stylish dark mode SVG graph with the data.

  • Generates a README.md for a HuggingFace repo in the base model folder. (with optional customizable templates)

2. Single-stage subcommands

If you only want to run specific stages:

# Quantize only
ezexl3 quantize -m /path/to/base_model -b 2,2.5,3,4,5,6 -d 0,1

# Quantize with optimized target (automatically ensures integer neighbors)
ezexl3 repo -m /path/to/base_model -b 4.07 -d 0

# Measure only
ezexl3 measure -m /path/to/base_model -b 2,3,4,5,6 -d 0,1

# Generate README only (from existing CSV)
ezexl3 readme -m /path/to/base_model -t fire

(but really everything is checkpointed so it usually doesn't hurt to just run the "repo" command every time)

3. Template System

You can customize the generated README by providing a template name via --template or -t. Templates are stored in the /ezexl3/templates/ directory.

The system is flexible with naming. For example, -t fire will search for:

  • templates/fire.md
  • templates/fireTemplateREADME.md
  • templates/fireREADME.md
  • templates/fireTemplate.md

If no template is specified, it defaults to basicTemplateREADME.md.

Easily generate your own custom template with AI assistance!

Copy and paste any TemplateREADME.md into your favorite LLM (Gemini, Claude, ChatGPT) along with this example prompt, followed by your own description:

Take this template, keep the main layout and variables, and modify it aesthetically based on my following prompts. Preserve all of the labels and title strings, only change the aesthetic, not the words or numbers:

*Make it dark and understated, high contrast, professional, metallic.*

Then save the resulting output in /ezexl3/templates/ as mynewTemplateREADME.md

Use your template with

ezexl3 repo -m /path/to/base_model -t mynew -b 2,3,4,5,6 -d 0,1

4. Catbench

SVG Catbench is available as a measurement option via the -cb flag. It runs catbench inference at every BPW level (including optimized fractionals), extracts SVGs, and assembles them into a grid in the final README.

ezexl3 repo -m /path/to/base_model -b 2,3,4,5,6,8 -d 0,1 -t punk -cb
  • -cb alone runs 3 samples per BPW (default), -cb 5 runs 5
  • Catbench runs as a third job category in the measurement phase GPU queue alongside PPL and KL
  • VRAM pre-flight check before each catbench load — skips gracefully if model won't fit, automatically uses multi-GPU for large models
  • Best valid SVG is selected from N samples for the grid
  • SVG extraction and grid assembly happen in a batch pass after all inference completes
  • Catbench results are checkpointed like everything else — rerunning skips completed samples
  • bf16 baseline included when VRAM allows

5. Advanced: Passthrough Flags

You can pass custom arguments directly to the underlying quantization (multiConvert) or measurement scripts using the --quant-args and --measure-args flags.

Important: These flags require a double-dash -- delimiter to separate the passthrough block from the rest of the arguments.

# Pass custom calibration dataset to quantization
ezexl3 repo -m /path/to/model -b 4.0 --quant-args -- -pm

# Pass custom rows/device settings to measurement
ezexl3 repo -m /path/to/model -b 4.0 --measure-args -- -r 200 -d 0

Common Use Cases:

  • Quantization: -pm (MoE speedup)
  • Measurement: -r / --rows (number of rows for PPL)

Note: passthrough blocks consume remaining args until another passthrough block starts, so keep normal CLI flags (like --no-readme) before --measure-args -- ...

Optimized BPW workflow

If you request a optimized BPW (for example 4.07), ezexl3 now executes the following order:

  1. Detect optimized targets and remove them from the initial integer quant queue.
  2. Ensure required neighboring integers exist in the quant queue (4 and 5 for 4.07).
  3. Run normal integer quantization.
  4. Run exllamav3 util/measure.py in a dynamic multi-GPU queue for required integer pairs (resume-safe: skips if measurements/<low>-<high>_measurement.json exists), with terminal logs when jobs are assigned and completed per GPU.
  5. Run exllamav3 util/optimize.py to build the optimized output directory.
  6. Run normal ezexl3 KL/PPL measurement over all produced targets (integers + optimizeds).

To locate exllamav3 utility scripts robustly, ezexl3 attempts runtime package discovery and supports overriding with:

EXLLAMAV3_ROOT=/path/to/exllamav3 ezexl3 repo -m /path/to/model -b 4.07

6. Headless Mode

For automated pipelines, use the --no-prompt (or -np) flag to skip interactive metadata collection for the README. It will use sensible defaults based on the model directory name and your environment.

ezexl3 repo -m /path/to/model -b 4.0 --no-prompt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ezexl3-0.0.7.tar.gz (443.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ezexl3-0.0.7-py3-none-any.whl (434.9 kB view details)

Uploaded Python 3

File details

Details for the file ezexl3-0.0.7.tar.gz.

File metadata

  • Download URL: ezexl3-0.0.7.tar.gz
  • Upload date:
  • Size: 443.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ezexl3-0.0.7.tar.gz
Algorithm Hash digest
SHA256 cbdb5498c1591cd64732d3db05c5fc2670922ab0480b2cf7463ea1d26fc96b05
MD5 ccf7dcad7962c2ad5ab2b2859a83193f
BLAKE2b-256 a2ff0031242c64700aae74f51b7e8490d0e098ee2515828475cda4672e3a478e

See more details on using hashes here.

File details

Details for the file ezexl3-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: ezexl3-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 434.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ezexl3-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 d2764229b7471a73feedfdd4badadb7e189f8ba09621d33eb68e0b3271785479
MD5 398fa00bdf9f60b322fe1e4c297592b0
BLAKE2b-256 49f179ffba6f65edbd0cbe0ca6adf7878ae4b7c6c919aba4ece35377e7e909b9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page