Skip to main content

JANG — Adaptive Mixed-Precision Quantization for Apple Silicon. The GGUF equivalent for MLX.

Project description

MLX Studio

Run JANG models with MLX Studio — the easiest way to run LLMs on Apple Silicon

MLX Studio App


Early Adoption: JANG is a new quantization format. LM Studio, Ollama, oMLX, and other MLX inference apps do not support JANG yet. Use MLX Studio (native JANG support) or the jang Python package for inference. Ask your favorite app's creators to add JANG support!


JANG

Jang Adaptive N-bit Grading

Mixed-Precision Quantization for Apple Silicon

The GGUF equivalent for MLX — models stay quantized in GPU memory at full Metal speed.

What is JANG?

JANG redistributes quantization bits based on tensor sensitivity. Critical layers (attention) get more bits, bulk layers (MLP) compensate — same total size, smarter allocation.

Like GGUF K-quants for MLX.

Results

2-bit: JANG doubles MLX on every model

Model JANG_2S MLX 2-bit Size
Qwen3.5-122B MoE 84% MMLU 56% 38 GB vs 36 GB
Qwen3.5-35B MoE 62% MMLU ~20% 12 GB vs 10 GB
Qwen3.5-9B 36% MMLU 18% 3.5 GB vs 2.6 GB
Qwen3.5-4B 28% MMLU 14% 1.6 GB vs 1.3 GB

4-bit: JANG_4K — smaller than MLX, higher MMLU

Model JANG_4K MLX 4-bit Size
Qwen3.5-35B MoE 84% MMLU 82% 16.7 GB vs 18 GB

Install

pip install jang

For inference on Apple Silicon:

pip install "jang[mlx]"

Quick Start

Convert any model

# K-quant 4-bit (budget-neutral, same size as MLX, smarter)
jang convert Qwen/Qwen3.5-35B-A3B -p 4

# 2-bit for extreme compression
jang convert Qwen/Qwen3.5-122B-A10B -p 2

# Specific profile
jang convert model -p JANG_2S

Run inference

from jang_tools.loader import load_jang_model
from mlx_lm.sample_utils import make_sampler
from mlx_lm.generate import generate_step
import mlx.core as mx

model, tokenizer = load_jang_model("JANGQ-AI/Qwen3.5-122B-A10B-JANG_2S")
sampler = make_sampler(temp=0.7)

tokens = tokenizer.encode("What is photosynthesis?")
for tok, _ in generate_step(prompt=mx.array(tokens), model=model, max_tokens=200, sampler=sampler):
    t = tok.item() if hasattr(tok, 'item') else int(tok)
    print(tokenizer.decode([t]), end="", flush=True)
    if t == tokenizer.eos_token_id:
        break

Profiles

Profile Type Bits Best for
JANG_4K K-quant 4.0 Same size as MLX 4-bit, smarter
JANG_3K K-quant 3.0 Same size as MLX 3-bit, smarter
JANG_2S Profile ~2.1 Tightest 2-bit, near MLX 2-bit size
JANG_2M Profile ~2.1 Balanced 2-bit
JANG_2L Profile ~2.3 Quality 2-bit
JANG_1L Profile ~2.2 Maximum quality 2-bit

Pre-quantized Models

Available on HuggingFace:

Model Profile MMLU Size
Qwen3.5-122B-A10B JANG_2S 84% 38 GB
Qwen3.5-35B-A3B JANG_4K 84% 16.7 GB
Qwen3.5-35B-A3B JANG_2S 62% 12 GB

Supported Architectures

Dense Transformer, Mixture of Experts, Hybrid SSM, Linear Attention, MLA, Vision-Language, Mamba, FP8 source models.

Links


Created by Jinho Jang — jangq.ai

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jang-1.2.0.tar.gz (50.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jang-1.2.0-py3-none-any.whl (54.2 kB view details)

Uploaded Python 3

File details

Details for the file jang-1.2.0.tar.gz.

File metadata

  • Download URL: jang-1.2.0.tar.gz
  • Upload date:
  • Size: 50.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for jang-1.2.0.tar.gz
Algorithm Hash digest
SHA256 c31ddfb82e24548525a40f604703d6eeb132c31fad06632b1b609d8bee3a2e6f
MD5 f45c6699b8be3ebaa503ddd33ed8e51b
BLAKE2b-256 5e12327d733e5c18f284f8e4d17eb2afb1841fafcc51541accdec6c05df1ebc6

See more details on using hashes here.

File details

Details for the file jang-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: jang-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 54.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for jang-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f32e653c7721f37134bde1c1e560149da8ab5615a4d512ba44e2d26603c98fd2
MD5 aaa3057fad32018d0ce3b0cecd47b61c
BLAKE2b-256 c91929db35ce0278008116f2804e6fd7034f2ed274fbd6e9f863438b4b4fe102

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page