Skip to main content

Implementing auto adpq

Project description

Docs Paper PyPI HF CI Build and Release

auto_adpq

Adaptive Post-Training Quantization tooling (replicating AdpQ)

This repository implements tools and reference code to reproduce the ideas from AdpQ: A Zero-shot Calibration Free Adaptive Post Training Quantization Method for LLMs.

This README explains how to install, run tests, build documentation (including multi-version docs), and contribute.

Installation

Install from PyPI (recommended):

python -m pip install auto_adpq

Install the latest development version directly from GitHub:

python -m pip install "git+https://github.com/Tfloow/auto_adpq.git"

To develop locally (editable install):

git clone https://github.com/Tfloow/auto_adpq.git
cd auto_adpq
python -m pip install -e .

Makefile helper:

# Run formatting, linting, coverage and docs targets as defined in Makefile
make

Quick usage

Import the package and use the public API. Example (replace with real API):

from auto_adpq import Auto_AdpQ

Add a short usage snippet here specific to the package functions you expect users to try first.

The most simple way to quantize a model is to follow a similar script as in examples/simple_quantization.py.

Running tests & linters

Coverage test: 91%

  • Run tests with pytest:
pytest -q
  • Run full coverage report (Makefile target):
make coverage
  • Format & lint with ruff (Makefile target):
make ruff

Debug mode

To obtain logs of the package, it is possible to enable the logging module. To activate it please create the new environment variable AUTO_ADPQ_DEBUG by running:

# Linux
export AUTO_ADPQ_DEBUG=1

# Windows
$Env:AUTO_ADPQ_DEBUG = 1

Documentation

The documentation can be found here.

Building the documentation

This project uses Sphinx for documentation. There are two common workflows:

  • Build a single-version site (useful for local writing and previews):
python -m pip install -r docs/requirements.txt
python -m sphinx -b html docs docs/_build/html
  • Build a multi-version site using sphinx-multiversion (we configure this in docs/conf.py). This produces one static site containing each built branch and tag (useful for publishing versioned docs with a dropdown selector):
python -m pip install -r docs/requirements.txt
sphinx-multiversion docs docs/_build/html-mv

Notes about versions

  • The project includes a small template docs/_templates/versions.html which renders a versions dropdown when the site is built with sphinx-multiversion.
  • Adjust smv_tag_whitelist and smv_branch_whitelist in docs/conf.py to control which tags/branches are included in the build.

Tasklist

  • Solve the datapacking issue #1
  • Support efficient inference (maybe wrap in SpQR?)
  • Optimize pydantic module AdpQQuantizedWeights
    • Currently, there is a major overhead when creating a new object to validate the field. Since it is used internally only, we could ditch the Pydantic module but would need to ensure proper dump and load function
  • Support model and integrate with .safetensors

Quantized models

Pre-quantized models are available in this collection. They are simulated models meaning they are stored as bf16 values instead of the quantized versions. If I stored them in the custom format, I would either need an algorithm to reconstruct the weights in full at runtime or develop a custom CUDA kernel, which is quite tough.

Nonetheless, those models represent the quality and rounding errors that a typical quantized model can meet.

Performances

Current performance


Model Variant Quantization Method PPL (Perplexity)
meta-llama/Llama-3.1-8B Baseline 4.8693
BNB 5.0733
AdpQ 5.3671
meta-llama/Llama-3.1-8B-Instruct Baseline 4.9080
BNB 4.9993
AdpQ 5.0069
AWQ 5.0440
GPTQ nan
meta-llama/Llama-3.2-1B Baseline 6.5546
AdpQ 9% 6.9491
BNB 6.9971
AdpQ 2% 7.0380
meta-llama/Llama-3.2-3B-Instruct Baseline 5.7864
AWQ 5.8339
AdpQ 5.9040

Contributing

Contributions are welcome. A suggested workflow:

  1. Fork the repository and create a feature branch.
  2. Add tests for new functionality.
  3. Run ruff to format and lint.
  4. Open a pull request describing the change.

Please include unit tests and keep the public API stable when possible.

Development notes

  • Docs templates: docs/_templates/versions.html — version switcher used by sphinx-multiversion.
  • Makefile targets: make ruff, make coverage, make docs (runs single and multiversion builds).

License

This work is under Apache 2.0 License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auto_adpq-0.3.5.tar.gz (26.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

auto_adpq-0.3.5-py3-none-any.whl (19.3 kB view details)

Uploaded Python 3

File details

Details for the file auto_adpq-0.3.5.tar.gz.

File metadata

  • Download URL: auto_adpq-0.3.5.tar.gz
  • Upload date:
  • Size: 26.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for auto_adpq-0.3.5.tar.gz
Algorithm Hash digest
SHA256 3577298a9d0591e1d8a6a912af70e015cdfafc8cfef33fc3d4dde38edd56106e
MD5 bf375b2af31d3495b5854817cb673df9
BLAKE2b-256 800304d394426defb2aefe49911e9d92437067562ffa1659b0546b327b57a700

See more details on using hashes here.

File details

Details for the file auto_adpq-0.3.5-py3-none-any.whl.

File metadata

  • Download URL: auto_adpq-0.3.5-py3-none-any.whl
  • Upload date:
  • Size: 19.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for auto_adpq-0.3.5-py3-none-any.whl
Algorithm Hash digest
SHA256 928095f1f1df579051edd4545fcf5bbd0a49ad03e9f6343515244789a5e7cf3d
MD5 7e3fdc8887b889c9964255f320e4d8b7
BLAKE2b-256 ffb1151b0c36a82e6a4f252c39839714d6485acb7b00b6166e1825000d3f270f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page