edgerazor

EdgeRazor: Lightweight framework for Edge AI.

These details have not been verified by PyPI

Project links

Homepage

Project description

Lightweight Framework for Edge AI

✨ If you like our project, please give us a star ⭐️ for the latest update.

EdgeRazor is a unified and lightweight framework for edge AI, designed to train models that are smaller, faster, and deployable across diverse hardware, ranging from mobile and edge endpoints to latency-sensitive clouds. The EdgeRazor framework seamlessly integrates model compression techniques into existing full-precision training pipelines with minimal code modification, preserving promising task performance and enabling low-cost and high-efficiency computations.

EdgeRazor currently focuses on low-bit LLM compression via configurable quantization-aware distillation. In terms of quantization, EdgeRazor supports quantizing weights (including embedding and lm_head layers), activations, and KV cache. Quantized bit-widths include the uniform 1.58-bit and 4-bit, as well as matrix-wise mixed-precision, such as 2.79-bit (50% 4-bit + 50% 1.58-bit) and 1.88-bit (12.5% 4-bit + 87.5% 1.58-bit). In terms of distillation, EdgeRazor offers the logits, features, and attention distillation, all of which can be flexibly combined within a unified configuration interface.

EdgeRazor achieves the state-of-the-art performance across a range of models, including base LLMs, instruction-tuned LLMs, and multimodal LLMs. For W-A8-KV8 quantization, Qwen3-0.6B-EdgeRazor attains average scores of 47.80 / 44.10 / 41.76 / 39.81 at 4-bit / 2.79-bit / 1.88-bit / 1.58-bit, corresponding to compression ratios of 3.94× / 5.05× / 6.40× / 7.03×, respectively. In comparison, the best prior methods achieve 45.74 / 37.38 / 30.49 at 4-bit / 3-bit / 2-bit with compression ratios of 2.21× / 2.47× / 2.78×.

EdgeRazor Architecture
Figure: The EdgeRazor framework with lightweight model training pipeline.

News

🔥 [2026-04]: 🏆 Low-bit LLMs by EdgeRazor is released! Check our Hugging Face collection: zhangsq-nju/edgerazor-nbit.
🔥 [2026-04]: 🛠️ Open-sourced EdgeRazor-V1 is released! Now configurable on diverse models for seamless integration and customization!
🔥 [2025-10]: 📄 Paper-TernaryCLIP is available on arXiv: 2510.21879!

News
Contents
Getting Started
Main Techniques
Applications
Model Zoo
- LLMs
- MLLMs
Todo List
Citation
Contributor List

Getting Started

Installation

Download from PyPi

pip install edgerazor

Download from GitHub (latest version)

git clone https://github.com/zhangsq-nju/EdgeRazor.git && cd EdgeRazor
conda create -n edgerazor python=3.10.20 -y
conda activate edgerazor
pip install -e .[cu128]

Usage

After installation, you can integrate EdgeRazor into your existing training pipeline to build lightweight models.

Use unified configuration by yaml, json or dict.
Seamlessly integrate EdgeRazor into your FULL-PRECISION model training and enjoy your lightweight journey!

# Init EdgeRazor for lightweight model
edgerazor = EdgeRazor(config="/path/to/config.yaml")
student = edgerazor.quantize(student)
# Training loop
student_outputs = student(inputs)
teacher_outputs = teacher(inputs)
# Calculate loss
loss, loss_dict = edgerazor.compute_loss(student_outputs, teacher_outputs, labels)

Low-bit LLMs Serving on Docker

You can obtain lightweight models from checkpoints trained with EdgeRazor. For example, you can convert Qwen3-EdgeRazor-4bit checkpoints to Q4_0 GGUF models. We also provide ready-to-use quantized models in our collection, including Qwen3-0.6B-EdgeRazor-GGUF and Qwen3-1.7B-EdgeRazor-GGUF.

# Serve quantized LLMs under CPU-only environments:
docker pull ghcr.io/ggml-org/llama.cpp:server
hf download zhangsq-nju/Qwen3-1.7B-EdgeRazor-GGUF Qwen3-1.7B-EdgeRazor-TQ2_0.gguf --local-dir /path/to/Qwen3-1.7B-EdgeRazor-GGUF
cd ./docker && bash local_server_tq2_0.sh

Main Techniques

Quantization-Aware Distillation (QAD):

Configurable mixed-precision quantization for weights
Configurable knowledge distillation pipelines between 16-bit and $n$-bit models
Knowledge distillation methods: Adaptive Feature Distillation (AFD), Entropy-Aware KL Divergence (EAKLD)

Applications

Lightweight ViT-S/16, check here.
Lightweight ResNet-18, check here.
Lightweight Qwen3-0.6B/1.7B, coming soon.
Lightweight MobileLLM-ParetoQ-350M-BF16, coming soon.
Lightweight Qwen2.5-Omni-7B, coming soon.

Model Zoo

LLMs

Average Performance (Avg.): average of performance scores in multiple tasks using lm-eval v0.4.9.1 with tasks.
- Tasks for instruct LLMs: arc_easy, arc_challenge, hellaswag, boolq, social_iqa, openbookqa, piqa, winogrande, hendrycks_ethics, truthfulqa_mc2, mmlu, gsm8k, humaneval_instruct, ifeval.
- Tasks for base LLMs: arc_easy, arc_challenge, hellaswag, boolq, social_iqa, openbookqa, piqa, winogrande, hendrycks_ethics, truthfulqa_mc2, mmlu, gsm8k, humaneval.
- Except for 5-shot gsm8k, all other tasks are 0-shot.
Hub Link: We provide the original quantized checkpoints. We also transfer the checkpoints into GGUF (llama.cpp) and GPTQ (GPTQModel, working in progress) formats if compatible.

Model	W-A-KV	Group Size	Avg.	Hub Link
Qwen3-0.6B	W16-A16-KV16	-	47.35	Base
Qwen3-0.6B	W4-A8-KV8	256	47.80	EdgeRazor, Q4_0.gguf
Qwen3-0.6B	W2.79-A8-KV8	256	44.10	EdgeRazor
Qwen3-0.6B	W1.88-A8-KV8	256	41.76	EdgeRazor
Qwen3-0.6B	W1.58-A8-KV8	256	39.81	EdgeRazor, TQ1_0.gguf, TQ2_0.gguf
Qwen3-1.7B	W16-A16-KV16	-	58.65	Base
Qwen3-1.7B	W4-A8-KV8	256	58.57	EdgeRazor, Q4_0.gguf
Qwen3-1.7B	W2.79-A8-KV8	256	53.00	EdgeRazor
Qwen3-1.7B	W1.88-A8-KV8	256	47.14	EdgeRazor
Qwen3-1.7B	W1.58-A8-KV8	256	43.91	EdgeRazor, TQ1_0.gguf, TQ2_0.gguf
MobileLLM-350M	W16-A16-KV16	-	41.18	Base
MobileLLM-350M	W4-A8-KV8	64	41.86	EdgeRazor
MobileLLM-350M	W2.79-A8-KV8	64	40.62	EdgeRazor
MobileLLM-350M	W1.88-A8-KV8	64	39.02	EdgeRazor
MobileLLM-350M	W1.58-A8-KV8	64	38.12	EdgeRazor

MLLMs

Video-MME and MLVU are video understanding tasks using lmms-eval v0.5.0 with tasks.

Model	W-A-KV	Group Size	Video-MME	MLVU	Hub Link
Qwen2.5-Omni-7B	W16-A16-KV16	-	62.81	48.01	Base
Qwen2.5-Omni-7B	W4-A16-KV16	32	62.22	48.82	EdgeRazor

Todo List

EdgeRazor is continuously evolving! Here's what's coming:

Support pypi installation
Lightweight MobileLLM, Qwen3, and Qwen2.5-Omni: training code
Upgrade to support the newest dependencies

Have ideas or suggestions? We welcome and appreciate any contributions and collaborations! Please feel free to submit issues or pull requests! 🚀

Citation

If you find our papar and code useful in your research, please consider kindly citing our papers ✏️:

@article{zhangsh-edgerazor,
  title={{EdgeRazor}: A Lightweight Framework for Large Language Models via Mixed-Precision Quantization-Aware Distillation},
  author={Shu-Hao Zhang and Le-Tong Huang and Xiang-Sheng Deng and Xin-Yi Zou and Chen Wu and Nan Li and Shao-Qun Zhang},
  year={2026},
}

@article{zhangsh-ternaryclip,
  title={{TernaryCLIP}: Efficiently Compressing Vision-Language Models with Ternary Weights and Distilled Knowledge},
  author={Shu-Hao Zhang and Wei-Cheng Tang and Chen Wu and Peng Hu and Nan Li and Liang-Jie Zhang and Qi Zhang and Shao-Qun Zhang},
  year={2025},
  journal={arXiv preprint arXiv:2510.21879}
}

Contributor List

This project was supported by LAMDA Lab and Assistant Professor Shao-Qun Zhang. Shu-Hao Zhang is the core developer and maintainer of EdgeRazor-V1. Xiang-Sheng Deng and Le-Tong Huang jointly participated in the development of this project.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.3.4

May 26, 2026

1.3.3

May 21, 2026

1.3.2

May 21, 2026

1.3.1

May 12, 2026

1.3.0

May 11, 2026

This version

1.2.8

Apr 15, 2026

1.2.7

Apr 15, 2026

1.2.6

Apr 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edgerazor-1.2.8.tar.gz (74.4 kB view details)

Uploaded Apr 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

edgerazor-1.2.8-py3-none-any.whl (87.6 kB view details)

Uploaded Apr 15, 2026 Python 3

File details

Details for the file edgerazor-1.2.8.tar.gz.

File metadata

Download URL: edgerazor-1.2.8.tar.gz
Upload date: Apr 15, 2026
Size: 74.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for edgerazor-1.2.8.tar.gz
Algorithm	Hash digest
SHA256	`fa18af068121ea98200a47a62db0e7f141de9fef1904827f7a9fa64ea8d69bc0`
MD5	`4435e86a5baac90dc8d7ded880b8e54a`
BLAKE2b-256	`7442371b1ce40fd3f952020294fbdbcce14ee7500daa9798b1cb5e9de158d7ba`

See more details on using hashes here.

File details

Details for the file edgerazor-1.2.8-py3-none-any.whl.

File metadata

Download URL: edgerazor-1.2.8-py3-none-any.whl
Upload date: Apr 15, 2026
Size: 87.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for edgerazor-1.2.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`169ab2de1c70b22458c172d6abf70b66d494c9314af1a6e8b9952bf168f545ac`
MD5	`e61cd1622a8407c51b30cfe2d45d5c58`
BLAKE2b-256	`fe33bab161cea9c0b22b8c22561bcd19a7adeef01cec4b0ef667abf3d442cc5d`

See more details on using hashes here.

edgerazor 1.2.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Lightweight Framework for Edge AI

✨ If you like our project, please give us a star ⭐️ for the latest update.

News

Contents

Getting Started

Installation

Usage

Low-bit LLMs Serving on Docker

Main Techniques

Applications

Model Zoo

LLMs

MLLMs

Todo List

Citation

Contributor List

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes