Local AI inference for Apple Silicon — Text, Image, Video & Audio generation on Mac

These details have not been verified by PyPI

Project links

Project description

vMLX

vMLX — Swift (dev branch)

For a production Swift engine with a more streamlined, user-ready feature set, please use osaurus.ai.
This tree is where the more experimental engine paths live — DDFlash, JANG smelt, JANGTQ, Flash MoE, hybrid-SSM cache work, and other research-y directions that aren't ready for a shipping product yet.

⚠️ EXPERIMENTAL SANDBOX — NOT A REAL APP ⚠️

This dev branch is NOT a product. It is not meant to be used. It is not meant to be "tested" the way you'd test an app.

This tree is an experimental workbench for trying new low-level research directions that have no place in the stable user app yet. Specifically:

DDFlash / JANG-DFlash speculative decoding — block-diffusion drafter + DDTree beam + coordinator-aware verify. See Sources/vMLXLMCommon/DFlash/.

JANG smelt — partial-expert streaming / smelting. The UI flag is live; the Swift-side loader is not — Stream emits an honest per-request warning when the toggle is on.

JANGTQ / MXTQ quant formats — native Swift repack, TurboQuant KV cache, MXTQ packed dequant (PRNG parity still WIP). See Sources/vMLXLMCommon/JangMXTQDequant.swift, JangSpecBundleLoader.swift, TurboQuantKVCache in Sources/vMLXLMCommon/TurboQuant/.

Flash MoE expert streaming from SSD, cache coordinator work, hybrid-SSM + sliding-window disk round-trip, Llama 4 iRope, etc.

If you want to use vMLX, run the stable Electron + bundled-Python app at jjang-ai/mlxstudio (current release v1.3.54). That is the shipping product.

If you want to use this Swift dev branch: don't. It's not for that. It exists so a handful of people can try experimental engine paths end-to-end and break things in isolation without putting user installs at risk.

Do not treat dev DMGs as app beta builds. vdev-* tags on mlxstudio releases are engineering snapshots — prereleased so the five people touching this tree can grab a notarized binary quickly. They are never promoted to user channels. Do not file "this is broken" bug reports against them as if they were products; reproductions welcome, expectations of stability are not.

Things break. APIs shift. Whole model families don't work yet. Cache tiers have known correctness gaps. Multi-turn reuse is incomplete on some hybrid paths. Do not deploy this. Do not integrate it into anything that matters. Do not rely on it for any production workload.

What this is

The entire MLX inference stack — Metal kernels, quant paths, attention paths, scheduler, tokenizer, chat loop, HTTP routes, desktop UI — in a single SwiftPM package. No external path: dependencies, no upstream drift risk. We control every layer, and when we need a change we commit it here directly instead of coordinating with mlx-swift, mlx-swift-examples, or vmlx-swift-lm upstreams.

As of 2026-04-13 the vendoring is complete: Cmlx + all 8 MLX Swift targets live next to the vMLX* targets under one Package.swift (23 targets, 5 external deps only).

Binaries produced:

vmlxctl — CLI (serve, chat, pull, ls, dflash-smoke)
vMLX — SwiftUI macOS app (5 modes: Chat, Server, Image, Terminal, API)

Status & limitations

Honest snapshot of the dev branch. Anything not listed here should be assumed broken or absent.

✅ Works (live-verified today)

Area	State
Model load + text generation	✅ Llama 3.2 1B, Qwen3 0.6B, Gemma 4 e2b (4-bit MLX)
Full `swift build`	✅ clean, all 23 targets
Dev DMG build	✅ notarized + stapled, `vdev-20260415-0612` shipped
`vmlxctl chat / serve / pull / ls`	✅ basic paths
HTTP server: OpenAI `/v1/chat/completions`, `/v1/models`	✅ streaming + non-streaming
HTTP server: Ollama `/api/{chat,generate,tags,...}`	✅
HTTP server: Anthropic `/v1/messages`	✅
Admin: `/admin/{soft,deep}-sleep`, `/wake`, `/cache/`, `/dflash/`	✅
Model library auto-scan (HuggingFace cache)	✅
Model-family auto-detection (4-tier: JANG gold → silver → bronze → fallback)	✅
Cache stack: paged (L1) + memory (L1.5) + disk (L2) + SSM companion	✅ live, with known gaps below
Prefix cache multi-turn reuse	✅ standard path; ⚠️ partial on DFlash
Sliding-window attention disk round-trip (Gemma 3/4, Mistral 4 maxKVSize)	✅ via `.rotating` LayerKind
BaichuanM1 `CacheList` disk serialization	✅ via `.cachelist` LayerKind (F-G2)
JANG-DFlash speculative decoding	✅ text-only, MiniMax / Mistral 4 / DeepSeek V3 targets, coordinator-aware
Flash MoE expert streaming (Phase 2b on some families)	✅ opt-in per model
TurboQuant KV cache integration	✅ via `make_cache` patch
Reasoning/tool parsers	✅ 13 reasoning + 15 tool parsers registered
§15 reasoning-off → content reroute	✅
Logs: `LogStore` ring + SwiftUI `LogsPanel` + per-request `RequestLogger`	✅
Settings: 4-tier merge (global → session → chat → request)	✅
Download manager: HF auth, byte resume, progress bar	✅
Embeddings endpoint	✅
Whisper ASR `/v1/audio/transcriptions`	✅
TTS `/v1/audio/speech`	⚠️ placeholder tone backend only (Kokoro scaffolded)
MCP stdio JSON-RPC + tool dispatch	✅
Terminal mode with `bash` tool auto-inject	✅
`/metrics` Prometheus endpoint	✅

⚠️ Partial / flaky

MXTQ PRNG mismatch — Swift JangMXTQDequant uses POSIX srand48; Python writer uses NumPy PCG64. Sign sequences diverge → some MXTQ bundles decode garbage weights. Known blocker for several JANGTQ checkpoints. Fix: port PCG64 to Swift or re-seed the Python writer.
Engine.LoadOptions default drift — stricter cache defaults (cacheMemoryPercent=0.10, maxCacheBlocks=500) than GlobalSettings (0.30 / 1000). Code paths that construct LoadOptions() directly get smaller caches than configured.
Smelt mode — UI toggle + settings field exist, but the Swift engine has no partial-expert-loader equivalent to Python's smelt_loader.py. Labelled honestly as "Python engine only"; Stream emits a per-request warning when smelt=true.
DFlash streaming reasoning split — DFlash emits N-token blocks; v1 routes all decoded text as .content. Client-side reasoning-extractor still works after the fact, but live <think>...</think> delta routing is not live on the DFlash path.
DFlash tools/images — falls back to standard path (logged) when the request includes tools or images. Text-only path only for v1.
xcodegen + SwiftPM app bundling — the xcodebuild archive path produces a bare executable instead of a .app. The ship-DMG script assembles the bundle manually from DerivedData products. Root cause not yet diagnosed.

❌ Broken or not yet implemented

Image generation .generate() bodies — Flux / Qwen-Image / Z-Image / SeedVR2 / FIBO DiT forward passes are still scaffolded. Biggest user-visible gap versus the Electron app.
Several model families stamped in silver table but no Swift class — CogVLM, Molmo, InternVL, Florence, GOT-OCR (F-G22), DeepSeek VL (F-G10). Silver rows resolve but Engine.load falls through.
FlashMoE family conformance gaps — DeepSeek V3 (F-G6), GLM-5 glm_moe_dsa (F-G7), Jamba (F-G8), Granite3 MoE (F-G9). Protocol is ready, per-model FlashMoEReplaceable conformance not wired.
Tool-parser silver rows missing — XLaM, Functionary (F-G11); OlmoE, BailingMoe (F-G12). Fall through to native.
Mistral 4 VLM dedicated text config (F-G13), Phi-4 reasoning parser verification (F-G14), Llama 4 tool format verification (F-G15), RWKVCache dedicated class (F-G16), MiniMax Lightning Attention cache verification (F-G17).
CacheList multi-sub-cache walker (F-G18) — partially addressed by F-G2 for the (Mamba, KV) case; hypothetical (Mamba, Rotating, Full) wrappers would still lose the second KV.
Step-3.5 reasoning + tool-call interleave (F-G19) — no test coverage.
GPT-OSS tool parser verification (F-G20) — silver row stamps glm47 parser; unverified against real checkpoint.
Gemma 4 image-token defensive assertion (F-G21).
PaliGemma template probe edge case (F-G23).
Kokoro neural TTS backend — scaffolded in vMLXTTS/Kokoro/KokoroBackend.swift; /v1/audio/speech returns deterministic placeholder-tone WAV until the 9-step port lands.
Whisper sliding-window for audio > 30s — single-pass only; no temperature fallback, no beam search, no word-level timestamps.
Universal binary — arm64-only. HasDType gates Float16 on #if !arch(x86_64) upstream, so x86_64 slices won't type-check.
App target bundling via xcodegen — see above; manual assembly works but needs proper fix.

Stack layout

Package.swift               23 local targets + 5 external deps
project.yml                 XcodeGen spec for the .app wrapper
scripts/
  stage-metallib.sh         stages mlx.metallib next to SwiftPM binaries

Sources/
  Cmlx/                     vendored MLX + mlx-c C++ (metal kernels inline)
                            + default.metallib (prebuilt, for SwiftPM)
  MLX/ MLXNN/ MLXFast/      MLX Swift runtime (vendored from mlx-swift)
  MLXFFT/ MLXLinalg/
  MLXOptimizers/ MLXRandom/

  vMLXLMCommon/             caches, paged cache, SSM companion, TQ,
                            Flash MoE, DFlash, evaluate loop, JANG loader,
                            MXTQ dequant
  vMLXLLM/                  ~50 LLM model classes
  vMLXVLM/                  ~15 vision-language model classes
  vMLXEmbedders/            embedding model classes
  vMLXFluxKit/ vMLXFluxModels/ vMLXFluxVideo/ vMLXFlux/
                            image / video generation stack (WIP)
  vMLXWhisper/ vMLXTTS/     audio IO
  vMLXEngine/               Engine actor: load, stream, cache, MCP,
                            Flash MoE, DFlash, ModelCapabilities,
                            CapabilityDetector, settings, metrics
  vMLXServer/               Hummingbird routes:
                              OpenAI / Ollama / Anthropic / Admin / MCP
                              / Metrics / Gateway
  vMLXApp/                  SwiftUI app (5 modes)
  vMLXTheme/                Linear-inspired tokens
  vMLXCLI/                  vmlxctl

vMLX/
  Assets.xcassets/          app icon set
  Info.plist                template (xcodegen fills $(...))
  vMLX.entitlements         App Sandbox disabled (Terminal mode)

Build

# Requires: macOS 14+, Xcode 15.4+ (Swift 5.10), xcodegen (brew install xcodegen)

git clone -b dev https://github.com/jjang-ai/vmlx.git
cd vmlx

# --- CLI (SwiftPM) ---
swift build -c release
# CRITICAL: colocate mlx.metallib next to the binary. Without this
# every model load fails with "Failed to load the default metallib."
# (This is because the SwiftPM flat bundle layout doesn't match what
# `load_swiftpm_library` in device.cpp expects, so we fall through to
# the first-try `load_colocated_library` path instead.)
./scripts/stage-metallib.sh release

./.build/release/vmlxctl serve --model /path/to/model
./.build/release/vmlxctl chat  --model /path/to/model
./.build/release/vmlxctl pull  mlx-community/Qwen3-32B-4bit
./.build/release/vmlxctl list

# --- SwiftUI app (Xcode archive path) ---
# xcodegen produces vMLX.xcodeproj; current archive path builds a bare
# executable, not a .app bundle (known bundling quirk). The ship-DMG
# script manually assembles the .app from DerivedData products — see
# PROGRESS.md for the exact steps until we fix the xcodegen config.
xcodegen
open vMLX.xcodeproj

arm64 only.

HTTP surfaces

Family	Endpoints
OpenAI	`/v1/{chat/completions, completions, responses, embeddings, models, rerank, images/generations, images/edits, audio/transcriptions, audio/speech}`
Ollama	`/api/{chat, generate, embeddings, embed, tags, show, ps, version, pull}`
Anthropic	`/v1/messages` (streaming, vision, `document`, `server_tool_use`)
Admin	`/health`, `/admin/{soft-sleep, deep-sleep, wake, cache/stats, dflash, dflash/load, dflash/unload, models/:id}`
MCP	`/v1/mcp/{tools, servers, execute}`, `/mcp/:server/:method`
Metrics	`/metrics` (Prometheus text)
Gateway	multiplexes single base-URL across N model sessions

Responses API (/v1/responses) covers both string and structured input shapes (message / function_call / function_call_output / input_text / input_image), tools, tool_choice, reasoning.effort bucketing, and streams the full Responses event family (response.created, output_item.added, output_text.delta, reasoning_summary_text.delta, function_call_arguments.delta, output_item.done, response.completed, [DONE]).

Roadmap / TODO

Tracked day-to-day in PROGRESS.md. High-level headline items in priority order:

Blockers for a first user-visible release:

MXTQ PRNG parity — currently produces garbage for some JANGTQ bundles.
Image generation .generate() bodies — Flux/Qwen-Image/Z-Image forward passes still scaffolded.
xcodegen .app bundling — manual assembly works; fix so xcodebuild archive produces a proper .app.

Model-family coverage (F-G matrix — 23 items tracked):

F-G1 ✅ SSMStateCache mediaSalt
F-G2 ✅ BaichuanM1 CacheList disk walker
F-G3 ✅ Llama 4 dedicated model class (iRope + MoE + ChunkedKVCache)
F-G4 ✅ Gemma 3 tool parser hermes → gemma4
F-G5 ✅ Gemma 3 mixed SWA+full detection
F-G6..F-G23 🟡 pending — see PROGRESS.md + SWIFT-PER-FAMILY-MATRIX-2026-04-15.md
- FlashMoE conformance: DeepSeek V3, GLM-5, Jamba, Granite3 MoE
- Missing Swift classes: DeepSeek VL, CogVLM, Molmo, InternVL, Florence, GOT-OCR
- Tool parser silver rows: XLaM, Functionary, OlmoE, BailingMoe
- Verification: Phi-4 reasoning, Llama 4 tool format, MiniMax Lightning Attention, GPT-OSS tool parser
- Cache correctness: RWKVCache dedicated class, CacheList multi-sub walker
- Defensive: Gemma 4 image token assertion, PaliGemma probe

Cache correctness tail:

Engine.LoadOptions ↔ GlobalSettings default drift
Smelt partial-expert loader in Swift (or honestly strip the toggle)
DFlash streaming reasoning parser
DFlash tool-call path

Audio:

Kokoro neural TTS backend (9-step port plan in vMLXTTS/Kokoro/KokoroBackend.swift)
Whisper long-form (sliding window, temperature fallback, word timestamps)
TTS mp3 / opus / flac transcoding

Build + shipping:

xcodegen/SwiftPM application-bundle wrapping (see quirk above)
Automated ship-DMG script for future dev releases

Related docs in-tree

PROGRESS.md — per-session changelog, newest at top. Read this first if you want to know what moved recently.
NO-REGRESSION-CHECKLIST.md — release regression matrix.
Sources/vMLXLMCommon/FlashMoE/README.md — Flash MoE phase architecture.
SWIFT-PER-FAMILY-MATRIX-2026-04-15.md (local only, not pushed) — per-family F-G1..F-G23 audit with file:line citations.

Not in this tree

Production Electron app — panel/ subtree kept for reference but never built in this pipeline. Real v1.3.x releases come out of jjang-ai/vmlx main branch via electron-builder.
User documentation — this README is a dev log, not a user manual. User docs live on jjang-ai/mlxstudio when the Swift path is mature enough to document for users.

Legal

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.4.0

Apr 29, 2026

1.3.99

Apr 27, 2026

1.3.98

Apr 27, 2026

1.3.97

Apr 26, 2026

1.3.96

Apr 26, 2026

1.3.95

Apr 26, 2026

1.3.94

Apr 26, 2026

This version

1.3.93

Apr 26, 2026

1.3.92

Apr 26, 2026

1.3.86

Apr 24, 2026

1.3.85

Apr 24, 2026

1.3.84

Apr 23, 2026

1.3.83

Apr 23, 2026

1.3.82

Apr 23, 2026

1.3.81

Apr 23, 2026

1.3.80

Apr 22, 2026

1.3.79

Apr 22, 2026

1.3.78

Apr 22, 2026

1.3.77

Apr 22, 2026

1.3.76

Apr 21, 2026

1.3.75

Apr 21, 2026

1.3.74

Apr 21, 2026

1.3.73

Apr 21, 2026

1.3.72

Apr 21, 2026

1.3.71

Apr 21, 2026

1.3.70

Apr 20, 2026

1.3.69

Apr 20, 2026

1.3.68

Apr 20, 2026

1.3.67

Apr 20, 2026

1.3.66

Apr 20, 2026

1.3.65

Apr 20, 2026

1.3.64

Apr 20, 2026

1.3.63

Apr 20, 2026

1.3.61

Apr 17, 2026

1.3.59

Apr 17, 2026

1.3.58

Apr 16, 2026

1.3.57

Apr 16, 2026

1.3.56

Apr 16, 2026

1.3.55

Apr 15, 2026

1.3.54

Apr 15, 2026

1.3.53

Apr 14, 2026

1.3.52

Apr 14, 2026

1.3.51

Apr 14, 2026

1.3.50

Apr 14, 2026

1.3.49

Apr 14, 2026

1.3.35

Apr 9, 2026

1.3.34

Apr 9, 2026

1.3.33

Apr 9, 2026

1.3.30

Apr 7, 2026

1.3.29

Apr 6, 2026

1.3.28

Apr 5, 2026

1.3.27

Apr 4, 2026

1.3.26

Apr 3, 2026

1.3.14

Mar 26, 2026

1.3.11

Mar 25, 2026

1.3.5

Mar 21, 2026

1.3.4

Mar 21, 2026

1.3.3

Mar 21, 2026

1.3.0

Mar 20, 2026

1.0.10

Mar 20, 2026

1.0.9

Mar 19, 2026

1.0.8

Mar 18, 2026

1.0.7

Mar 18, 2026

1.0.6

Mar 17, 2026

1.0.5

Mar 17, 2026

1.0.4

Mar 17, 2026

1.0.3

Mar 17, 2026

1.0.2

Mar 16, 2026

1.0.1

Mar 16, 2026

1.0.0

Mar 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vmlx-1.3.93.tar.gz (727.4 kB view details)

Uploaded Apr 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vmlx-1.3.93-py3-none-any.whl (820.0 kB view details)

Uploaded Apr 26, 2026 Python 3

File details

Details for the file vmlx-1.3.93.tar.gz.

File metadata

Download URL: vmlx-1.3.93.tar.gz
Upload date: Apr 26, 2026
Size: 727.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for vmlx-1.3.93.tar.gz
Algorithm	Hash digest
SHA256	`abed8583bcb081e0d28e20bb10915f956a00bddd472757932d81f590d6ce3e2d`
MD5	`141e0fd52197f229851464061421910a`
BLAKE2b-256	`6da786e76dc97775bf1ff1537cea38a36204f592cc6b10544bf86b4614d3cf95`

See more details on using hashes here.

File details

Details for the file vmlx-1.3.93-py3-none-any.whl.

File metadata

Download URL: vmlx-1.3.93-py3-none-any.whl
Upload date: Apr 26, 2026
Size: 820.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for vmlx-1.3.93-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ef3347ea03e36792f20d0e60a7ce74bf730bd83dfafa88c432603d9b08ff037e`
MD5	`97fb5882d7b3d51639fd62a4889c7615`
BLAKE2b-256	`6a9168b8e189a72a11e8b33eac26d1ea1b72e508765bb24be0a104aebde277a3`

See more details on using hashes here.

vmlx 1.3.93

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

vMLX — Swift (dev branch)

⚠️ EXPERIMENTAL SANDBOX — NOT A REAL APP ⚠️

What this is

Status & limitations

✅ Works (live-verified today)

⚠️ Partial / flaky

❌ Broken or not yet implemented

Stack layout

Build

HTTP surfaces

Roadmap / TODO

Related docs in-tree

Not in this tree

Legal

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes