JANG — Adaptive Mixed-Precision Quantization for Apple Silicon. The GGUF equivalent for MLX.
Project description
MLX Studio — the only app that natively supports JANG models
Compatibility Notice
JANG is a new quantization format. The following apps do NOT support JANG yet:
- LM Studio — does not support JANG
- Ollama — does not support JANG
- oMLX — does not support JANG
- Inferencer — does not support JANG
MLX Studio is currently the only app with native JANG support. You can also use the jang Python package directly (pip install "jang[mlx]").
Want JANG support in your favorite app? Ask the developers to add it! JANG is open-source (GitHub) and the format spec is public (FORMAT.md).
Jang Adaptive N-bit Grading
Mixed-Precision Quantization for Apple Silicon
The GGUF equivalent for MLX — models stay quantized in GPU memory at full Metal speed.
What is JANG?
JANG redistributes quantization bits based on tensor sensitivity. Critical layers (attention) get more bits, bulk layers (MLP) compensate — same total size, smarter allocation.
Like GGUF K-quants for MLX.
Results
4-bit: JANG_4K beats MLX 4-bit on MoE
| Model | JANG_4K | MLX 4-bit | Size |
|---|---|---|---|
| Qwen3.5-122B MoE | 94% MMLU | 90% | 69 GB vs 64 GB |
| Qwen3.5-35B MoE | 84% MMLU | 82% | 16.7 GB vs 18 GB |
2-bit: JANG doubles MLX on every model
| Model | JANG_2S | MLX 2-bit | Size |
|---|---|---|---|
| Qwen3.5-122B MoE | 84% MMLU | 56% | 38 GB vs 36 GB |
| Qwen3.5-35B MoE | 62% MMLU | ~20% | 12 GB vs 10 GB |
| Qwen3.5-9B | 36% MMLU | 18% | 3.5 GB vs 2.6 GB |
| Qwen3.5-4B | 28% MMLU | 14% | 1.6 GB vs 1.3 GB |
Install
pip install jang
For inference on Apple Silicon:
pip install "jang[mlx]"
Quick Start
Convert any model
# K-quant 4-bit (budget-neutral, same size as MLX, smarter)
jang convert Qwen/Qwen3.5-35B-A3B -p 4
# 2-bit for extreme compression
jang convert Qwen/Qwen3.5-122B-A10B -p 2
# Specific profile
jang convert model -p JANG_2S
Run inference
from jang_tools.loader import load_jang_model
from mlx_lm.sample_utils import make_sampler
from mlx_lm.generate import generate_step
import mlx.core as mx
model, tokenizer = load_jang_model("JANGQ-AI/Qwen3.5-122B-A10B-JANG_2S")
sampler = make_sampler(temp=0.7)
tokens = tokenizer.encode("What is photosynthesis?")
for tok, _ in generate_step(prompt=mx.array(tokens), model=model, max_tokens=200, sampler=sampler):
t = tok.item() if hasattr(tok, 'item') else int(tok)
print(tokenizer.decode([t]), end="", flush=True)
if t == tokenizer.eos_token_id:
break
Profiles
| Profile | Type | Bits | Best for |
|---|---|---|---|
JANG_4K |
K-quant | 4.0 | Same size as MLX 4-bit, smarter |
JANG_3K |
K-quant | 3.0 | Same size as MLX 3-bit, smarter |
JANG_2S |
Profile | ~2.1 | Tightest 2-bit, near MLX 2-bit size |
JANG_2M |
Profile | ~2.1 | Balanced 2-bit |
JANG_2L |
Profile | ~2.3 | Quality 2-bit |
JANG_1L |
Profile | ~2.2 | Maximum quality 2-bit |
Pre-quantized Models
Available on HuggingFace:
| Model | Profile | MMLU | HumanEval | Size |
|---|---|---|---|---|
| Qwen3.5-122B-A10B | JANG_4K | 94% | 95% | 69 GB |
| Qwen3.5-122B-A10B | JANG_2S | 84% | 90% | 38 GB |
| Qwen3.5-35B-A3B | JANG_4K | 84% | 90% | 16.7 GB |
| Qwen3.5-35B-A3B | JANG_2S | 62% | — | 12 GB |
Supported Architectures
Dense Transformer, Mixture of Experts, Hybrid SSM, Linear Attention, MLA, Vision-Language, Mamba, FP8 source models.
Links
한국어
Apple Silicon을 위한 혼합정밀도 양자화
JANG이란?
JANG은 MLX를 위한 GGUF와 같은 역할을 하는 오픈소스 양자화 포맷입니다. 텐서 민감도에 따라 비트를 재분배합니다 — 중요 레이어(어텐션)에 더 많은 비트, 벌크 레이어(MLP)로 보상. 같은 크기, 더 스마트한 배분.
결과
4-bit: JANG_4K가 MLX 4-bit보다 우수 (MoE 모델)
| 모델 | JANG_4K | MLX 4-bit | 크기 |
|---|---|---|---|
| Qwen3.5-122B MoE | 94% MMLU | 90% | 69 GB vs 64 GB |
| Qwen3.5-35B MoE | 84% MMLU | 82% | 16.7 GB vs 18 GB |
2-bit: JANG이 모든 모델에서 MLX의 2배 성능
| 모델 | JANG_2S | MLX 2-bit | 크기 |
|---|---|---|---|
| Qwen3.5-122B MoE | 84% MMLU | 56% | 38 GB vs 36 GB |
| Qwen3.5-35B MoE | 62% MMLU | ~20% | 12 GB vs 10 GB |
| Qwen3.5-9B | 36% MMLU | 18% | 3.5 GB vs 2.6 GB |
| Qwen3.5-4B | 28% MMLU | 14% | 1.6 GB vs 1.3 GB |
설치
pip install jang # 양자화 도구
pip install "jang[mlx]" # + Apple Silicon 추론
모델 변환
# K-quant 4-bit (예산 중립, MLX와 같은 크기, 더 스마트)
jang convert Qwen/Qwen3.5-35B-A3B -p 4
# 극한 압축을 위한 2-bit
jang convert Qwen/Qwen3.5-122B-A10B -p 2
추론
from jang_tools.loader import load_jang_model
from mlx_lm.sample_utils import make_sampler
from mlx_lm.generate import generate_step
import mlx.core as mx
model, tokenizer = load_jang_model("JANGQ-AI/Qwen3.5-122B-A10B-JANG_2S")
sampler = make_sampler(temp=0.7)
tokens = tokenizer.encode("광합성이란 무엇인가요?")
for tok, _ in generate_step(prompt=mx.array(tokens), model=model, max_tokens=200, sampler=sampler):
t = tok.item() if hasattr(tok, 'item') else int(tok)
print(tokenizer.decode([t]), end="", flush=True)
if t == tokenizer.eos_token_id:
break
프로파일
| 프로파일 | 유형 | 비트 | 용도 |
|---|---|---|---|
JANG_4K |
K-quant | 4.0 | MLX 4-bit과 같은 크기, 더 스마트 |
JANG_3K |
K-quant | 3.0 | MLX 3-bit과 같은 크기, 더 스마트 |
JANG_2S |
프로파일 | ~2.1 | 가장 타이트한 2-bit |
JANG_1L |
프로파일 | ~2.2 | 최대 품질 2-bit |
사전 양자화 모델
HuggingFace에서 다운로드:
| 모델 | 프로파일 | MMLU | HumanEval | 크기 |
|---|---|---|---|---|
| Qwen3.5-122B-A10B | JANG_4K | 94% | 95% | 69 GB |
| Qwen3.5-122B-A10B | JANG_2S | 84% | 90% | 38 GB |
| Qwen3.5-35B-A3B | JANG_4K | 84% | 90% | 16.7 GB |
| Qwen3.5-35B-A3B | JANG_2S | 62% | — | 12 GB |
호환성
현재 **MLX Studio**만 JANG 포맷을 기본 지원합니다. LM Studio, Ollama, oMLX, Inferencer 등은 아직 지원하지 않습니다. 좋아하는 앱의 개발자에게 JANG 지원을 요청해 주세요!
링크
- GitHub · HuggingFace · MLX Studio · PyPI
장진호 제작 · Created by Jinho Jang — jangq.ai
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jang-1.4.0.tar.gz.
File metadata
- Download URL: jang-1.4.0.tar.gz
- Upload date:
- Size: 55.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b80a6808762b91e7d5bb42bef2a86805c6ebc68b0900e45359bb22e586fb881
|
|
| MD5 |
a1cba617e86993d93ed5516cca20712c
|
|
| BLAKE2b-256 |
500752a8ecabd7e4e5a24a9bc78c9750cdeb502e133aaf1c09b788497db8037c
|
File details
Details for the file jang-1.4.0-py3-none-any.whl.
File metadata
- Download URL: jang-1.4.0-py3-none-any.whl
- Upload date:
- Size: 59.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
783001f2848112e28242d4b05034f79781c0f1f124abd0fb4bedd4e1e582fd6c
|
|
| MD5 |
9a5a20243dc7e8075119a27c48723324
|
|
| BLAKE2b-256 |
96b09ef5765ad68cce5d6e99dbe7de6cdbb1f90a5fb7ef364970b9cd6414a5fb
|