Run frontier MoE models on consumer hardware. 35B in 1.5GB RAM.

These details have not been verified by PyPI

Project links

Project description

Kandiga

Run 35B AI models in 1.5GB of RAM. Any Mac.

Kandiga is an open-source MoE inference engine that uses Selective Expert Materialization to run models that would normally require 20GB+ of memory in under 2GB on any Apple Silicon Mac.

How it works

Large MoE (Mixture of Experts) models like Qwen3.5-35B-A3B have 256 experts per layer, but only activate 8 per token. Kandiga exploits this sparsity:

Shared layers (attention, norms, embeddings) load to GPU memory (~1.5GB)
Expert MLP weights stay on disk in packed binary files (~17GB SSD)
Per token: the router selects 8 experts, which are read from SSD via pread
CPU computes expert MLP with NEON-vectorized 4-bit dequant + GCD parallelism
GPU computes attention simultaneously via MLX (unified memory, zero copy)

This is the KTransformers architecture adapted for Apple Silicon's unified memory.

Install

pip install kandiga

Requirements: macOS with Apple Silicon (M1/M2/M3/M4), Python 3.10+

Quick start

# One-time setup: download model + prepare expert files (~20 min)
kandiga setup

# Interactive chat
kandiga chat

# Fast mode (K=4 experts instead of 8, ~2x speed, slightly less quality)
kandiga chat --fast

# One-shot prompt
kandiga "What is the capital of France?"

# Start an OpenAI-compatible API server
kandiga serve

# Run benchmarks
kandiga bench

Benchmarks

Measured on M4 Mac Mini (16GB), Qwen3.5-35B-A3B-4bit:

Mode	Experts	Speed	RAM	Quality
Quality (K=8)	8/256 per layer	~3.5 tok/s	1.5GB	Full
Fast (K=4)	4/256 per layer	~6.5 tok/s	1.5GB	Near-equal

For comparison, loading the full model requires 20.4GB of RAM and MLX alone achieves ~25 tok/s when it fits in memory. Kandiga trades speed for accessibility: if your Mac has 8-16GB of RAM, you can now run a 35B model that previously required 24GB+.

Architecture

User prompt
    |
    v
[Tokenizer + Chat Template]
    |
    v
[MLX Forward Pass]
    |
    +---> GPU: Attention + Norms + Router + Shared Expert + Blending
    |
    +---> CPU: Routed Expert MLP (NEON 4-bit dequant + GCD parallel)
    |         |
    |         +-- pread expert weights from SSD (OS page cache)
    |         +-- gate_proj matvec (512x2048)
    |         +-- up_proj matvec (512x2048)
    |         +-- SwiGLU activation
    |         +-- down_proj matvec (2048x512)
    |
    v
[Token Output]

Both CPU and GPU operate on the same physical DRAM (Apple Silicon unified memory), so there is zero data transfer overhead between them.

API Server

Kandiga includes an OpenAI-compatible HTTP API:

kandiga serve --port 8340

import openai

client = openai.OpenAI(base_url="http://localhost:8340/v1", api_key="unused")
response = client.chat.completions.create(
    model="mlx-community/Qwen3.5-35B-A3B-4bit",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

Project structure

kandiga/
  __init__.py          # Package version
  cli.py               # CLI entry point (argparse)
  engine.py            # Core inference engine (SEM)
  chat.py              # Interactive chat (Rich terminal UI)
  serve.py             # OpenAI-compatible HTTP API (FastAPI)
  bench.py             # Benchmarking suite
  setup.py             # Model download + expert splitting + packing
  _split_experts.py    # Split stacked weights into per-expert files
  _pack_experts.py     # Pack per-expert files into binary format
  _build.py            # Compile CPU expert dylib from source
  metal/
    kandiga_cpu_expert.h   # C API header
    kandiga_cpu_expert.m   # NEON + GCD implementation
    Makefile               # Build the dylib
  tools/
    __init__.py            # Future: web search, file access
scripts/
  install.sh           # Quick install script
tests/
  ...

Development

# Clone
git clone https://github.com/kantheon/kandiga.git
cd kandiga

# Install in development mode
pip install -e ".[serve]"

# Build the CPU expert library
cd kandiga/metal && make && cd ../..

# Run tests
pytest tests/ -v

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.9.0

Apr 4, 2026

0.8.0

Apr 4, 2026

0.7.1

Mar 26, 2026

0.7.0

Mar 26, 2026

0.6.0

Mar 26, 2026

0.5.1

Mar 25, 2026

0.5.0

Mar 25, 2026

0.4.0

Mar 25, 2026

0.3.1

Mar 25, 2026

This version

0.3.0

Mar 25, 2026

0.2.1

Mar 25, 2026

0.2.0

Mar 25, 2026

0.1.0

Mar 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kandiga-0.3.0.tar.gz (37.6 kB view details)

Uploaded Mar 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kandiga-0.3.0-py3-none-any.whl (37.1 kB view details)

Uploaded Mar 25, 2026 Python 3

File details

Details for the file kandiga-0.3.0.tar.gz.

File metadata

Download URL: kandiga-0.3.0.tar.gz
Upload date: Mar 25, 2026
Size: 37.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for kandiga-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`5d69f50803b743ca062d354b33c3d1d5301a24d043e0a3c21038f82ec1c1981e`
MD5	`c5a5780d20bcc020aabda938b49cca6b`
BLAKE2b-256	`b8d077ba7d5adf990fa25e12f8f0080f137d765afc1c466e4bc17d5ab4c5d6d0`

See more details on using hashes here.

File details

Details for the file kandiga-0.3.0-py3-none-any.whl.

File metadata

Download URL: kandiga-0.3.0-py3-none-any.whl
Upload date: Mar 25, 2026
Size: 37.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for kandiga-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8538b6167d92b5534553d5bdf04f7ef26d51c603c62ada6164c491298270e46b`
MD5	`81afa452f6f59c14fa99d3d1d0ee5192`
BLAKE2b-256	`f68f1bb20d0d38042b3bece50c5cefd24f3fde301e8f631b9712a46166c5c138`

See more details on using hashes here.

kandiga 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Kandiga

How it works

Install

Quick start

Benchmarks

Architecture

API Server

Project structure

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes