Flyto MLX — Apple Silicon LLM server with audio chat, DFlash, and Chinese model presets (based on oMLX)
Reason this release was yanked:
0.4.0 install regression; use brew tap or pip git+install instead, see project README
Project description
Flyto MLX
Apple Silicon LLM 服务器 · Audio chat · DFlash 双引擎 · 中文模型预设
Based on oMLX by @jundot.
中文 | English
简介
Flyto MLX 是面向中国 Mac 用户与国产模型生态优化的 Apple Silicon 本地 LLM 服务器,基于 @jundot/oMLX fork。在保留 oMLX 全部上游能力(OpenAI 兼容 API、多模型 LRU 调度、KV 分页缓存、Mac menubar GUI)的基础上,加入了上游尚未合并/未支持的功能:
| 能力 | 说明 |
|---|---|
| Gemma 4 audio chat | OpenAI input_audio content type 端到端支持,调用 gemma4-e2b / gemma4-e4b 直接听音频回答(不是 ASR 替代,是端到端 audio understanding) |
| DFlash 双引擎 (Path A) | Qwen / Gemma 4 双 backend,drafter co-loaded 优化 |
| Tahoe 兼容 | macOS 26 NSStatusItem occlusion bit 修复 |
| 上游已修但未发版的 backport | tokenizer lm_head、TokenBuffer cache hit seed、health-check Session 复用 等 5 处 |
| 中文模型预设 | Qwen 3.5 MoE/Dense / DeepSeek V4 / Gemma 4 / 等 alias 即装即用 |
| Gitee 镜像 + ModelScope 模型源 | 国内 access 优化 |
安装
# pip
pip install flyto-mlx
# 启动 server(CLI 兼容上游 omlx,主名为 fmlx)
fmlx serve --port 8000
# 或
omlx serve --port 8000 # alias,与上游兼容
DMG / brew tap 后续随 release 提供。
快速试 audio chat
# 假设 server 已起在 :8000,API key 设为 mykey
python3 <<'PY'
import base64, requests, json
with open("recording.wav","rb") as f:
b64 = base64.b64encode(f.read()).decode()
r = requests.post(
"http://localhost:8000/v1/chat/completions",
headers={"Authorization": "Bearer mykey"},
json={
"model": "gemma4-e2b",
"max_tokens": 400,
"temperature": 0.3,
"messages": [{"role": "user", "content": [
{"type": "text", "text": "总结这段电话的关键信息"},
{"type": "input_audio", "input_audio": {"data": b64, "format": "wav"}}
]}]
},
)
print(r.json()["choices"][0]["message"]["content"])
PY
跟上游 oMLX 的关系
Flyto MLX 是 oMLX 的下游 fork,遵循 Apache 2.0。我们定期 cherry-pick 上游 bug fix 与新模型支持,但不再向上游 PR 自家 feature(audio chat、DFlash 等)。如果你只想要纯上游体验,请用 @jundot/oMLX。
详细 attribution 与版权声明见 NOTICE 与 LICENSE。
License
Apache License 2.0. Based on oMLX by @jundot. 详见 LICENSE 与 NOTICE。
English
Flyto MLX is a fork of @jundot/oMLX optimized for the Chinese Mac LLM community and sovereign-AI model ecosystem (Qwen, DeepSeek, Gemma 4). It preserves all upstream oMLX capabilities (OpenAI-compatible API, multi-model LRU scheduling, KV paged cache, menubar GUI) and adds:
- Audio chat via OpenAI
input_audio— end-to-end Gemma 4 nano audio LLM through/v1/chat/completions - DFlash Path A double-engine — Qwen and Gemma 4 backends with optimized drafter co-loading
- macOS 26 Tahoe compatibility — NSStatusItem occlusion bit fix
- 5 upstream-fixed-but-unreleased patches backported — tokenizer lm_head, TokenBuffer cache hit seed, health-check session reuse, and more
- Chinese model presets — Qwen 3.5 MoE/Dense, DeepSeek V4, Gemma 4 aliases ready out of the box
- Gitee mirror + ModelScope model registry — for users in mainland China
Install: pip install flyto-mlx. CLI: fmlx serve (or omlx serve alias for upstream compatibility).
We periodically cherry-pick upstream fixes. We do not upstream our own features back. For pure upstream behaviour, please use @jundot/oMLX directly.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flyto_mlx-0.4.0.tar.gz.
File metadata
- Download URL: flyto_mlx-0.4.0.tar.gz
- Upload date:
- Size: 30.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
91eecf09011f0bb969f561c3f66af4269e014f71cf3d96f5a74a07095d2f07a5
|
|
| MD5 |
0a7318715132d3fdebca3aaea4632efd
|
|
| BLAKE2b-256 |
e058dc602541a4002f517290e699528fe931919b671b19cc1cd2ebe987a097a3
|
File details
Details for the file flyto_mlx-0.4.0-py3-none-any.whl.
File metadata
- Download URL: flyto_mlx-0.4.0-py3-none-any.whl
- Upload date:
- Size: 30.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
634245a285ca7aa1addad9505b5ab23a58a0a8b02daf0d7638ef37b251313005
|
|
| MD5 |
3010861190f3d911ab8aef98acd11f3d
|
|
| BLAKE2b-256 |
1adfbc7a9a89ddce54ac6550c05b8e829cc15b41cab796ceb5a8a7aba0a1fa9a
|