Production-grade model quantization SDK for enterprise custom models (AWQ, GGUF, and CoreML)
Project description
Qwodel - Production-Grade Model Quantization
Qwodel is a production-ready Python package for model quantization across multiple backends (AWQ, GGUF, CoreML). It provides a unified, intuitive API for quantizing large language models with minimal code.
Features
- Unified API - Simple interface across all quantization backends
- Multiple Backends - AWQ (GPU), GGUF (CPU), CoreML (Apple devices)
- Optional Dependencies - Install only what you need
- CLI & Python API - Use via command line or programmatically
- Type Safe - Full type hints and mypy validation
- Well Documented - Comprehensive docs with examples
Quick Start
Installation
Quick Install (All Backends)
pip install qwodel[all]
This installs all backends (GGUF, AWQ, CoreML) with PyTorch 2.1.2 (CPU version).
GPU Support (for AWQ only)
If you need GPU quantization with AWQ, install PyTorch with CUDA first:
# 1. Install PyTorch with CUDA 12.1
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
# 2. Install qwodel
pip install qwodel[all]
Note: GGUF and CoreML work perfectly fine with CPU-only PyTorch!
Individual Backends
# GGUF only (CPU quantization - most popular!)
pip install qwodel[gguf]
# AWQ only (GPU quantization)
pip install qwodel[awq]
# CoreML only (Apple devices)
pip install qwodel[coreml]
Local Development
# Clone and install locally
cd /path/to/qwodel
pip install -e .[all]
Python API
from qwodel import Quantizer
# Create quantizer
quantizer = Quantizer(
backend="gguf",
model_path="meta-llama/Llama-2-7b-hf",
output_dir="./quantized"
)
# Quantize model
output_path = quantizer.quantize(format="Q4_K_M")
print(f"Quantized model saved to: {output_path}")
CLI
# Quantize a model
qwodel quantize \
--backend gguf \
--format Q4_K_M \
--model meta-llama/Llama-2-7b-hf \
--output ./quantized
# List available formats
qwodel list-formats --backend gguf
Supported Backends
GGUF (CPU Quantization)
- Use Case: CPU inference, broad compatibility
- Formats: Q4_K_M, Q8_0, Q2_K, Q5_K_M, and more
- Best For: Most users, CPU-based deployment
AWQ (GPU Quantization)
- Use Case: NVIDIA GPU inference
- Formats: INT4
- Best For: GPU deployments, maximum speed
- Requires: CUDA 12.1+
CoreML (Apple Devices)
- Use Case: iOS, macOS, iPadOS deployment
- Formats: FLOAT16, INT8, INT4
- Best For: Apple device deployment
Examples
Batch Processing
from qwodel import quantize
models = ["meta-llama/Llama-2-7b-hf", "meta-llama/Llama-2-13b-hf"]
for model in models:
quantize(
model_path=model,
backend="gguf",
format="Q4_K_M",
output_dir="./quantized"
)
Custom Progress Callback
from qwodel import Quantizer
def progress_handler(progress: int, stage: str, message: str):
print(f"[{progress}%] {stage}: {message}")
quantizer = Quantizer(
backend="gguf",
model_path="./my-model",
output_dir="./output",
progress_callback=progress_handler
)
quantizer.quantize(format="Q4_K_M")
Documentation
Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
License
MIT License - see LICENSE for details.
Acknowledgments
Qwodel builds upon the excellent work of:
- llama.cpp for GGUF quantization
- llm-compressor for AWQ quantization
- CoreMLTools for CoreML conversion
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file qwodel-0.0.12.tar.gz.
File metadata
- Download URL: qwodel-0.0.12.tar.gz
- Upload date:
- Size: 208.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1943d31ea0c18fcb497c3469ba050a0d48d9e8a2caf8f9e40cd7ef2449a7586e
|
|
| MD5 |
d2843c13f56ab256ab0571fa54afe8c4
|
|
| BLAKE2b-256 |
9f222a74e9df7a72d6defa64f4ea9fd975200f7778b25cae06838dfef57b6b78
|
File details
Details for the file qwodel-0.0.12-py3-none-any.whl.
File metadata
- Download URL: qwodel-0.0.12-py3-none-any.whl
- Upload date:
- Size: 208.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2afac62e4343c47d016026138470156148d877008e67c8a28050526318a34dbe
|
|
| MD5 |
ea12938058af9581fa5d5a915b1f517b
|
|
| BLAKE2b-256 |
e12619afabb8766b639a14df51040087c6599db8650f400a0ff365c31c8fe371
|