Skip to main content

JANG — Adaptive Mixed-Precision Quantization for Apple Silicon. The GGUF equivalent for MLX.

Project description

MLX Studio

MLX Studio App

MLX Studio — the only app that natively supports JANG models


Early Adoption: LM Studio, Ollama, oMLX, Inferencer do not support JANG yet. Use MLX Studio or pip install "jang[mlx]". Ask your favorite app's creators to add JANG support!


JANG

Jang Adaptive N-bit Grading

Mixed-Precision Quantization for Apple Silicon

The GGUF equivalent for MLX — models stay quantized in GPU memory at full Metal speed.

License Python Platform PyPI

WebsiteModelsPyPIFormat Spec

Results (200-question MMLU)

MoE at 4-bit: JANG_4K beats MLX

Model JANG_4K MLX 4-bit JANG Size MLX Size
Qwen3.5-122B 86% 85% 69 GB 64 GB
Qwen3.5-35B 77.5% 75.5% 16.7 GB 18 GB

MoE at 2-bit: JANG dominates

Model JANG_2S MLX 2-bit JANG Size MLX Size
Qwen3.5-122B 79% 56.5% 38 GB 36 GB
Qwen3.5-35B 65.5% ~20% 12 GB 10 GB

MiniMax: JANG is the ONLY working option

Model JANG_2L MLX 4-bit MLX 3-bit MLX 2-bit
MiniMax-M2.5 74% 26.5% 24.5% 25%

MLX is broken on MiniMax at ALL bit levels (~25% = random). JANG scores 74%.

Dense/Hybrid at 2-bit: JANG saves what MLX destroys

Model JANG_2S MLX 2-bit JANG Size MLX Size
Qwen3.5-4B 28.5% 12.5% 1.5 GB 1.2 GB
Qwen3.5-9B 25.5% 22.0% 3.4 GB 2.7 GB

At 3-bit and 4-bit, MLX uniform is better on dense models — JANG's value is at 2-bit (where uniform fails) and on MoE (where attention is < 5% of params).

Install

pip install jang

For inference on Apple Silicon:

pip install "jang[mlx]"

For Vision-Language models:

pip install "jang[vlm]"

Quick Start

Convert any model

# K-quant 4-bit (same size as MLX, smarter allocation)
jang convert Qwen/Qwen3.5-35B-A3B -p 4

# 2-bit for extreme compression
jang convert Qwen/Qwen3.5-122B-A10B -p 2

# Specific profile
jang convert model -p JANG_2S

Run inference

from jang_tools.loader import load_jang_model
from mlx_lm.sample_utils import make_sampler
from mlx_lm.generate import generate_step
import mlx.core as mx

model, tokenizer = load_jang_model("JANGQ-AI/Qwen3.5-122B-A10B-JANG_2S")
sampler = make_sampler(temp=0.7)

tokens = tokenizer.encode("What is photosynthesis?")
for tok, _ in generate_step(prompt=mx.array(tokens), model=model, max_tokens=200, sampler=sampler):
    t = tok.item() if hasattr(tok, 'item') else int(tok)
    print(tokenizer.decode([t]), end="", flush=True)
    if t == tokenizer.eos_token_id:
        break

Upgrade v1 models to v2 (instant loading)

jang upgrade /path/to/model

CLI Commands

Command Description
jang convert <model> -p <profile> Convert HuggingFace model to JANG
jang upgrade <model> Upgrade v1 model to v2 (instant load)
jang inspect <model> Show bit allocation and model info
jang validate <model> Validate a JANG model directory
jang estimate <params> Estimate sizes (e.g., jang estimate 122B)

v2 Format — Instant Loading

JANG v2 stores weights in MLX-native format. Like GGUF — the file IS the runtime format. No conversion at load time.

v2 (current) v1 (legacy)
Load time Seconds (mmap) 5-10 minutes (repack)
File size Same Same

New conversions automatically use v2. Existing v1 models can be upgraded with jang upgrade.

Profiles

Profile Type Bits Best for
JANG_4K K-quant 4.0 Same size as MLX 4-bit, smarter
JANG_3K K-quant 3.0 Same size as MLX 3-bit, smarter
JANG_2S Profile ~2.1 Tightest 2-bit, near MLX 2-bit size
JANG_2L Profile ~2.3 Quality 2-bit
JANG_1L Profile ~2.2 Maximum quality 2-bit

Pre-quantized Models

Model Profile MMLU (200q) Size Best for
Qwen3.5-122B-A10B JANG_4K 86% 69 GB 192+ GB Mac
Qwen3.5-122B-A10B JANG_2S 79% 38 GB 64+ GB Mac
Qwen3.5-35B-A3B JANG_4K 77.5% 16.7 GB 36+ GB Mac
Qwen3.5-35B-A3B JANG_2S 65.5% 12 GB 24+ GB Mac
MiniMax-M2.5 JANG_2L 74% 89 GB 192+ GB Mac
Qwen3.5-9B JANG_2S 25.5% 3.4 GB 8 GB MacBook
Qwen3.5-4B JANG_2S 28.5% 1.5 GB 8 GB MacBook

Supported Architectures

Dense Transformer, Mixture of Experts, Hybrid SSM, Linear Attention (GatedDeltaNet), MLA (DeepSeek), Vision-Language, Mamba, FP8 source models (MiniMax, DeepSeek).

How It Works

JANG redistributes bits based on tensor sensitivity — same total size, smarter allocation:

CRITICAL  (attention, MoE routers)   →  6-8 bit  →  Controls coherence
IMPORTANT (embeddings, linear attn)  →  4-6 bit  →  Moderate sensitivity
COMPRESS  (MLP, MoE experts)         →  2-4 bit  →  98% of parameters

K-quant profiles (JANG_4K, JANG_3K) redistribute within the same bit budget — boost attention, compensate with least-important MLP. Same size as MLX, smarter allocation. Like GGUF K-quants.

Requirements

  • Python: 3.11+
  • Conversion: any platform (numpy + safetensors)
  • Inference: Apple Silicon Mac (M1/M2/M3/M4) with MLX
  • Dependencies: safetensors>=0.4, numpy>=1.24, tqdm>=4.60, huggingface_hub>=0.20
  • Optional: mlx>=0.22, mlx-lm>=0.20 (for inference), mlx-vlm>=0.1 (for VLM)

Links


한국어

JANG이란?

JANG은 Apple Silicon을 위한 오픈소스 혼합정밀도 양자화 포맷입니다. MLX를 위한 GGUF와 같은 역할을 합니다.

결과 (200문항 MMLU)

4-bit: JANG_4K가 MLX 4-bit보다 우수 (MoE 모델)

모델 JANG_4K MLX 4-bit 크기
Qwen3.5-122B 86% 85% 69 vs 64 GB
Qwen3.5-35B 77.5% 75.5% 16.7 vs 18 GB

2-bit: JANG이 MLX를 압도

모델 JANG_2S MLX 2-bit 크기
Qwen3.5-122B 79% 56.5% 38 vs 36 GB
Qwen3.5-35B 65.5% ~20% 12 vs 10 GB

MiniMax: JANG만 작동

모델 JANG_2L MLX 4-bit MLX 3-bit MLX 2-bit
MiniMax-M2.5 74% 26.5% 24.5% 25%

설치

pip install "jang[mlx]"

호환성

현재 **MLX Studio**만 JANG 포맷을 기본 지원합니다. LM Studio, Ollama, oMLX, Inferencer 등은 아직 지원하지 않습니다. 좋아하는 앱의 개발자에게 JANG 지원을 요청해 주세요!

GitHub · HuggingFace · MLX Studio · PyPI


장진호 제작 · Created by Jinho Jang — jangq.ai

Support on Ko-fi

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jang-2.1.1.tar.gz (62.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jang-2.1.1-py3-none-any.whl (66.1 kB view details)

Uploaded Python 3

File details

Details for the file jang-2.1.1.tar.gz.

File metadata

  • Download URL: jang-2.1.1.tar.gz
  • Upload date:
  • Size: 62.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for jang-2.1.1.tar.gz
Algorithm Hash digest
SHA256 17bad46e22fbc4cab0ed78ceaeb61ec36dc7cf93a4372d0a9f9214abc89dc8dc
MD5 6370ba4dfcd87551a99d3bb3652332a6
BLAKE2b-256 c417e482816870dc4d0073f95be6ce704e6406e474c862f92dfea77dd583ac91

See more details on using hashes here.

File details

Details for the file jang-2.1.1-py3-none-any.whl.

File metadata

  • Download URL: jang-2.1.1-py3-none-any.whl
  • Upload date:
  • Size: 66.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for jang-2.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 895c72b84661ebab93f4bac4278fa32c00207290c525b5e9a28a6f9a2234ced2
MD5 9682c40b7792049670ff9bc6335db899
BLAKE2b-256 6d4a37cb0c1861adf7a6311e8b66f13a7f7c1f800835c4dd7c02b856edf79e69

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page