Skip to main content

Production-grade model quantization SDK for enterprise custom models (AWQ, GGUF, and CoreML)

Project description

Qwodel - Production-Grade Model Quantization

Python 3.9+ License: MIT Code style: ruff

Qwodel is a production-ready Python package for model quantization across multiple backends (AWQ, GGUF, CoreML). It provides a unified, intuitive API for quantizing large language models with minimal code.

Features

  • Unified API - Simple interface across all quantization backends
  • Multiple Backends - AWQ (GPU), GGUF (CPU), CoreML (Apple devices)
  • Optional Dependencies - Install only what you need
  • CLI & Python API - Use via command line or programmatically
  • Type Safe - Full type hints and mypy validation
  • Well Documented - Comprehensive docs with examples

Quick Start

Installation

Quick Install (All Backends)

pip install qwodel[all]

This installs all backends (GGUF, AWQ, CoreML) with PyTorch 2.1.2 (CPU version).

GPU Support (for AWQ only)

If you need GPU quantization with AWQ, install PyTorch with CUDA first:

# 1. Install PyTorch with CUDA 12.1
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121

# 2. Install qwodel
pip install qwodel[all]

Note: GGUF and CoreML work perfectly fine with CPU-only PyTorch!

Individual Backends

# GGUF only (CPU quantization - most popular!)
pip install qwodel[gguf]

# AWQ only (GPU quantization)
pip install qwodel[awq]

# CoreML only (Apple devices)
pip install qwodel[coreml]

Local Development

# Clone and install locally
cd /path/to/qwodel
pip install -e .[all]

Python API

from qwodel import Quantizer

# Create quantizer
quantizer = Quantizer(
    backend="gguf",
    model_path="meta-llama/Llama-2-7b-hf",
    output_dir="./quantized"
)

# Quantize model
output_path = quantizer.quantize(format="Q4_K_M")
print(f"Quantized model saved to: {output_path}")

CLI

# Quantize a model
qwodel quantize \
    --backend gguf \
    --format Q4_K_M \
    --model meta-llama/Llama-2-7b-hf \
    --output ./quantized

# List available formats
qwodel list-formats --backend gguf

Supported Backends

GGUF (CPU Quantization)

  • Use Case: CPU inference, broad compatibility
  • Formats: Q4_K_M, Q8_0, Q2_K, Q5_K_M, and more
  • Best For: Most users, CPU-based deployment

AWQ (GPU Quantization)

  • Use Case: NVIDIA GPU inference
  • Formats: INT4
  • Best For: GPU deployments, maximum speed
  • Requires: CUDA 12.1+

CoreML (Apple Devices)

  • Use Case: iOS, macOS, iPadOS deployment
  • Formats: FLOAT16, INT8, INT4
  • Best For: Apple device deployment

Examples

Batch Processing

from qwodel import quantize

models = ["meta-llama/Llama-2-7b-hf", "meta-llama/Llama-2-13b-hf"]

for model in models:
    quantize(
        model_path=model,
        backend="gguf",
        format="Q4_K_M",
        output_dir="./quantized"
    )

Custom Progress Callback

from qwodel import Quantizer

def progress_handler(progress: int, stage: str, message: str):
    print(f"[{progress}%] {stage}: {message}")

quantizer = Quantizer(
    backend="gguf",
    model_path="./my-model",
    output_dir="./output",
    progress_callback=progress_handler
)

quantizer.quantize(format="Q4_K_M")

Documentation

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

Acknowledgments

Qwodel builds upon the excellent work of:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qwodel-0.0.6.tar.gz (208.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qwodel-0.0.6-py3-none-any.whl (208.1 kB view details)

Uploaded Python 3

File details

Details for the file qwodel-0.0.6.tar.gz.

File metadata

  • Download URL: qwodel-0.0.6.tar.gz
  • Upload date:
  • Size: 208.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for qwodel-0.0.6.tar.gz
Algorithm Hash digest
SHA256 bc16f51c7340778e319f2004b6ee0b1fbd4653dc73326244428e3859eb251f55
MD5 031ade0eb25639d2c1a744a16ec5ee47
BLAKE2b-256 cef14de9984c4cf7c21741e3c090cbc0be6cd1793936d13f6a1ab80148895edd

See more details on using hashes here.

File details

Details for the file qwodel-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: qwodel-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 208.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for qwodel-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 6d8db326e59c321bdae6933b2ca64d581fded3a24757d5d6f1b4d1102c449a32
MD5 843f91d33a30f24b2197bc23e6a37282
BLAKE2b-256 48a0a63e0e07ce55a1cce5fe2e9ed56558d77045eb4078338c9e67f31609e3a1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page