JANG — Adaptive Mixed-Precision Quantization for Apple Silicon. The GGUF equivalent for MLX.
Project description
MLX Studio — the only app that natively supports JANG models
Early Adoption: LM Studio, Ollama, oMLX, Inferencer do not support JANG yet. Use MLX Studio or
pip install "jang[mlx]". Ask your favorite app's creators to add JANG support!
Jang Adaptive N-bit Grading
Mixed-Precision Quantization for Apple Silicon
The GGUF equivalent for MLX — models stay quantized in GPU memory at full Metal speed.
Website • Models • PyPI • Format Spec
Results (200-question MMLU)
MoE at 4-bit: JANG_4K beats MLX
| Model | JANG_4K | MLX 4-bit | JANG Size | MLX Size |
|---|---|---|---|---|
| Qwen3.5-122B | 86% | 85% | 69 GB | 64 GB |
| Qwen3.5-35B | 77.5% | 75.5% | 16.7 GB | 18 GB |
MoE at 2-bit: JANG dominates
| Model | JANG_2S | MLX 2-bit | JANG Size | MLX Size |
|---|---|---|---|---|
| Qwen3.5-122B | 79% | 56.5% | 38 GB | 36 GB |
| Qwen3.5-35B | 65.5% | ~20% | 12 GB | 10 GB |
MiniMax: JANG is the ONLY working option
| Model | JANG_2L | MLX 4-bit | MLX 3-bit | MLX 2-bit |
|---|---|---|---|---|
| MiniMax-M2.5 | 74% | 26.5% | 24.5% | 25% |
MLX is broken on MiniMax at ALL bit levels (~25% = random). JANG scores 74%.
Install
pip install jang
For inference on Apple Silicon:
pip install "jang[mlx]"
For Vision-Language models:
pip install "jang[vlm]"
Quick Start
Convert any model
# K-quant 4-bit (same size as MLX, smarter allocation)
jang convert Qwen/Qwen3.5-35B-A3B -p 4
# 2-bit for extreme compression
jang convert Qwen/Qwen3.5-122B-A10B -p 2
# Specific profile
jang convert model -p JANG_2S
Run inference
from jang_tools.loader import load_jang_model
from mlx_lm.sample_utils import make_sampler
from mlx_lm.generate import generate_step
import mlx.core as mx
model, tokenizer = load_jang_model("JANGQ-AI/Qwen3.5-122B-A10B-JANG_2S")
sampler = make_sampler(temp=0.7)
tokens = tokenizer.encode("What is photosynthesis?")
for tok, _ in generate_step(prompt=mx.array(tokens), model=model, max_tokens=200, sampler=sampler):
t = tok.item() if hasattr(tok, 'item') else int(tok)
print(tokenizer.decode([t]), end="", flush=True)
if t == tokenizer.eos_token_id:
break
Upgrade v1 models to v2 (instant loading)
jang upgrade /path/to/model
CLI Commands
| Command | Description |
|---|---|
jang convert <model> -p <profile> |
Convert HuggingFace model to JANG |
jang upgrade <model> |
Upgrade v1 model to v2 (instant load) |
jang inspect <model> |
Show bit allocation and model info |
jang validate <model> |
Validate a JANG model directory |
jang estimate <params> |
Estimate sizes (e.g., jang estimate 122B) |
v2 Format — Instant Loading
JANG v2 stores weights in MLX-native format. Like GGUF — the file IS the runtime format. No conversion at load time.
| v2 (current) | v1 (legacy) | |
|---|---|---|
| Load time | Seconds (mmap) | 5-10 minutes (repack) |
| File size | Same | Same |
New conversions automatically use v2. Existing v1 models can be upgraded with jang upgrade.
Profiles
| Profile | Type | Bits | Best for |
|---|---|---|---|
JANG_4K |
K-quant | 4.0 | Same size as MLX 4-bit, smarter |
JANG_3K |
K-quant | 3.0 | Same size as MLX 3-bit, smarter |
JANG_2S |
Profile | ~2.1 | Tightest 2-bit, near MLX 2-bit size |
JANG_2L |
Profile | ~2.3 | Quality 2-bit |
JANG_1L |
Profile | ~2.2 | Maximum quality 2-bit |
Pre-quantized Models
| Model | Profile | MMLU (200q) | HumanEval | Size |
|---|---|---|---|---|
| Qwen3.5-122B-A10B | JANG_4K | 86% | 95% | 69 GB |
| Qwen3.5-122B-A10B | JANG_2S | 79% | 90% | 38 GB |
| Qwen3.5-35B-A3B | JANG_4K | 77.5% | 90% | 16.7 GB |
| Qwen3.5-35B-A3B | JANG_2S | 65.5% | — | 12 GB |
| MiniMax-M2.5 | JANG_2L | 74% | — | 89 GB |
Supported Architectures
Dense Transformer, Mixture of Experts, Hybrid SSM, Linear Attention (GatedDeltaNet), MLA (DeepSeek), Vision-Language, Mamba, FP8 source models (MiniMax, DeepSeek).
How It Works
JANG redistributes bits based on tensor sensitivity — same total size, smarter allocation:
CRITICAL (attention, MoE routers) → 6-8 bit → Controls coherence
IMPORTANT (embeddings, linear attn) → 4-6 bit → Moderate sensitivity
COMPRESS (MLP, MoE experts) → 2-4 bit → 98% of parameters
K-quant profiles (JANG_4K, JANG_3K) redistribute within the same bit budget — boost attention, compensate with least-important MLP. Same size as MLX, smarter allocation. Like GGUF K-quants.
Requirements
- Python: 3.11+
- Conversion: any platform (numpy + safetensors)
- Inference: Apple Silicon Mac (M1/M2/M3/M4) with MLX
- Dependencies:
safetensors>=0.4,numpy>=1.24,tqdm>=4.60,huggingface_hub>=0.20 - Optional:
mlx>=0.22,mlx-lm>=0.20(for inference),mlx-vlm>=0.1(for VLM)
Links
- GitHub | HuggingFace | MLX Studio | PyPI | Format Spec
한국어
JANG이란?
JANG은 Apple Silicon을 위한 오픈소스 혼합정밀도 양자화 포맷입니다. MLX를 위한 GGUF와 같은 역할을 합니다.
결과 (200문항 MMLU)
4-bit: JANG_4K가 MLX 4-bit보다 우수 (MoE 모델)
| 모델 | JANG_4K | MLX 4-bit | 크기 |
|---|---|---|---|
| Qwen3.5-122B | 86% | 85% | 69 vs 64 GB |
| Qwen3.5-35B | 77.5% | 75.5% | 16.7 vs 18 GB |
2-bit: JANG이 MLX를 압도
| 모델 | JANG_2S | MLX 2-bit | 크기 |
|---|---|---|---|
| Qwen3.5-122B | 79% | 56.5% | 38 vs 36 GB |
| Qwen3.5-35B | 65.5% | ~20% | 12 vs 10 GB |
MiniMax: JANG만 작동
| 모델 | JANG_2L | MLX 4-bit | MLX 3-bit | MLX 2-bit |
|---|---|---|---|---|
| MiniMax-M2.5 | 74% | 26.5% | 24.5% | 25% |
설치
pip install "jang[mlx]"
호환성
현재 **MLX Studio**만 JANG 포맷을 기본 지원합니다. LM Studio, Ollama, oMLX, Inferencer 등은 아직 지원하지 않습니다. 좋아하는 앱의 개발자에게 JANG 지원을 요청해 주세요!
GitHub · HuggingFace · MLX Studio · PyPI
장진호 제작 · Created by Jinho Jang — jangq.ai
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jang-2.0.0.tar.gz.
File metadata
- Download URL: jang-2.0.0.tar.gz
- Upload date:
- Size: 58.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a9208040fa794b2c751cfde6eb4ff5cdc4dd7d1e6a53ff39d65ef9e34c65084
|
|
| MD5 |
f9518391faf99bba8179038e724f00ca
|
|
| BLAKE2b-256 |
dd15c0ec5719925440d3f3245c149b38724a3b01aef30dcc5ff202b6ee0416f8
|
File details
Details for the file jang-2.0.0-py3-none-any.whl.
File metadata
- Download URL: jang-2.0.0-py3-none-any.whl
- Upload date:
- Size: 62.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8ef005ad466aad16656643d519758dcaf8ddaa6a3f7e5ff24ddb3f9c195f9c41
|
|
| MD5 |
36858b7b7515e8debe53f83abeb5f700
|
|
| BLAKE2b-256 |
de6f6b33b0bc6365bf442f599a209ddaee1b82f66c4d3bee5f550baadd10cff4
|