Skip to main content

JANG — Adaptive Mixed-Precision Quantization for Apple Silicon. The GGUF equivalent for MLX.

Project description

MLX Studio

MLX Studio App

MLX Studio — the only app that natively supports JANG models


Compatibility Notice

JANG is a new quantization format. The following apps do NOT support JANG yet:

  • LM Studio — does not support JANG
  • Ollama — does not support JANG
  • oMLX — does not support JANG
  • Inferencer — does not support JANG

MLX Studio is currently the only app with native JANG support. You can also use the jang Python package directly (pip install "jang[mlx]").

Want JANG support in your favorite app? Ask the developers to add it! JANG is open-source (GitHub) and the format spec is public (FORMAT.md).


JANG

Jang Adaptive N-bit Grading

Mixed-Precision Quantization for Apple Silicon

The GGUF equivalent for MLX — models stay quantized in GPU memory at full Metal speed.

What is JANG?

JANG redistributes quantization bits based on tensor sensitivity. Critical layers (attention) get more bits, bulk layers (MLP) compensate — same total size, smarter allocation.

Like GGUF K-quants for MLX.

Results

4-bit: JANG_4K beats MLX 4-bit on MoE

Model JANG_4K MLX 4-bit Size
Qwen3.5-122B MoE 94% MMLU 90% 69 GB vs 64 GB
Qwen3.5-35B MoE 84% MMLU 82% 16.7 GB vs 18 GB

2-bit: JANG doubles MLX on every model

Model JANG_2S MLX 2-bit Size
Qwen3.5-122B MoE 84% MMLU 56% 38 GB vs 36 GB
Qwen3.5-35B MoE 62% MMLU ~20% 12 GB vs 10 GB
Qwen3.5-9B 36% MMLU 18% 3.5 GB vs 2.6 GB
Qwen3.5-4B 28% MMLU 14% 1.6 GB vs 1.3 GB

Install

pip install jang

For inference on Apple Silicon:

pip install "jang[mlx]"

Quick Start

Convert any model

# K-quant 4-bit (budget-neutral, same size as MLX, smarter)
jang convert Qwen/Qwen3.5-35B-A3B -p 4

# 2-bit for extreme compression
jang convert Qwen/Qwen3.5-122B-A10B -p 2

# Specific profile
jang convert model -p JANG_2S

Run inference

from jang_tools.loader import load_jang_model
from mlx_lm.sample_utils import make_sampler
from mlx_lm.generate import generate_step
import mlx.core as mx

model, tokenizer = load_jang_model("JANGQ-AI/Qwen3.5-122B-A10B-JANG_2S")
sampler = make_sampler(temp=0.7)

tokens = tokenizer.encode("What is photosynthesis?")
for tok, _ in generate_step(prompt=mx.array(tokens), model=model, max_tokens=200, sampler=sampler):
    t = tok.item() if hasattr(tok, 'item') else int(tok)
    print(tokenizer.decode([t]), end="", flush=True)
    if t == tokenizer.eos_token_id:
        break

Profiles

Profile Type Bits Best for
JANG_4K K-quant 4.0 Same size as MLX 4-bit, smarter
JANG_3K K-quant 3.0 Same size as MLX 3-bit, smarter
JANG_2S Profile ~2.1 Tightest 2-bit, near MLX 2-bit size
JANG_2M Profile ~2.1 Balanced 2-bit
JANG_2L Profile ~2.3 Quality 2-bit
JANG_1L Profile ~2.2 Maximum quality 2-bit

Pre-quantized Models

Available on HuggingFace:

Model Profile MMLU HumanEval Size
Qwen3.5-122B-A10B JANG_4K 94% 95% 69 GB
Qwen3.5-122B-A10B JANG_2S 84% 90% 38 GB
Qwen3.5-35B-A3B JANG_4K 84% 90% 16.7 GB
Qwen3.5-35B-A3B JANG_2S 62% 12 GB

Supported Architectures

Dense Transformer, Mixture of Experts, Hybrid SSM, Linear Attention, MLA, Vision-Language, Mamba, FP8 source models.

Links


한국어

JANG

Apple Silicon을 위한 혼합정밀도 양자화

JANG이란?

JANG은 MLX를 위한 GGUF와 같은 역할을 하는 오픈소스 양자화 포맷입니다. 텐서 민감도에 따라 비트를 재분배합니다 — 중요 레이어(어텐션)에 더 많은 비트, 벌크 레이어(MLP)로 보상. 같은 크기, 더 스마트한 배분.

결과

4-bit: JANG_4K가 MLX 4-bit보다 우수 (MoE 모델)

모델 JANG_4K MLX 4-bit 크기
Qwen3.5-122B MoE 94% MMLU 90% 69 GB vs 64 GB
Qwen3.5-35B MoE 84% MMLU 82% 16.7 GB vs 18 GB

2-bit: JANG이 모든 모델에서 MLX의 2배 성능

모델 JANG_2S MLX 2-bit 크기
Qwen3.5-122B MoE 84% MMLU 56% 38 GB vs 36 GB
Qwen3.5-35B MoE 62% MMLU ~20% 12 GB vs 10 GB
Qwen3.5-9B 36% MMLU 18% 3.5 GB vs 2.6 GB
Qwen3.5-4B 28% MMLU 14% 1.6 GB vs 1.3 GB

설치

pip install jang                # 양자화 도구
pip install "jang[mlx]"         # + Apple Silicon 추론

모델 변환

# K-quant 4-bit (예산 중립, MLX와 같은 크기, 더 스마트)
jang convert Qwen/Qwen3.5-35B-A3B -p 4

# 극한 압축을 위한 2-bit
jang convert Qwen/Qwen3.5-122B-A10B -p 2

추론

from jang_tools.loader import load_jang_model
from mlx_lm.sample_utils import make_sampler
from mlx_lm.generate import generate_step
import mlx.core as mx

model, tokenizer = load_jang_model("JANGQ-AI/Qwen3.5-122B-A10B-JANG_2S")
sampler = make_sampler(temp=0.7)

tokens = tokenizer.encode("광합성이란 무엇인가요?")
for tok, _ in generate_step(prompt=mx.array(tokens), model=model, max_tokens=200, sampler=sampler):
    t = tok.item() if hasattr(tok, 'item') else int(tok)
    print(tokenizer.decode([t]), end="", flush=True)
    if t == tokenizer.eos_token_id:
        break

프로파일

프로파일 유형 비트 용도
JANG_4K K-quant 4.0 MLX 4-bit과 같은 크기, 더 스마트
JANG_3K K-quant 3.0 MLX 3-bit과 같은 크기, 더 스마트
JANG_2S 프로파일 ~2.1 가장 타이트한 2-bit
JANG_1L 프로파일 ~2.2 최대 품질 2-bit

사전 양자화 모델

HuggingFace에서 다운로드:

모델 프로파일 MMLU HumanEval 크기
Qwen3.5-122B-A10B JANG_4K 94% 95% 69 GB
Qwen3.5-122B-A10B JANG_2S 84% 90% 38 GB
Qwen3.5-35B-A3B JANG_4K 84% 90% 16.7 GB
Qwen3.5-35B-A3B JANG_2S 62% 12 GB

호환성

현재 **MLX Studio**만 JANG 포맷을 기본 지원합니다. LM Studio, Ollama, oMLX, Inferencer 등은 아직 지원하지 않습니다. 좋아하는 앱의 개발자에게 JANG 지원을 요청해 주세요!

링크


장진호 제작 · Created by Jinho Jang — jangq.ai

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jang-1.3.0.tar.gz (51.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jang-1.3.0-py3-none-any.whl (55.8 kB view details)

Uploaded Python 3

File details

Details for the file jang-1.3.0.tar.gz.

File metadata

  • Download URL: jang-1.3.0.tar.gz
  • Upload date:
  • Size: 51.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for jang-1.3.0.tar.gz
Algorithm Hash digest
SHA256 b89a02ba4591b3787cf5dbabfd093a01d427f58bb1b374a60ec68079eae16eae
MD5 a3177f978e4381f4928c9f743ee1b819
BLAKE2b-256 c028aba8f2f1c202e41b544be5c21ce0e712069d76c92fa4e1e2e15f6fa545e8

See more details on using hashes here.

File details

Details for the file jang-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: jang-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 55.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for jang-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 291d7e9d3c6f07eafcfb96eeb345d080de05ed5eeb81e8e3497e1d61f958acd1
MD5 7206bb0bf93bed9f15a5388700ec6991
BLAKE2b-256 f180cb4b0c8f62991d6692cde14b1470ad8973213f621f9e9cc1448587a1a50a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page