Skip to main content

Flyto MLX — Apple Silicon LLM server with audio chat, DFlash, and Chinese model presets (based on oMLX)

Reason this release was yanked:

0.4.0 install regression; use brew tap or pip git+install instead, see project README

Project description

Flyto MLX

Flyto MLX

Apple Silicon LLM 服务器 · Audio chat · DFlash 双引擎 · 中文模型预设
Based on oMLX by @jundot.

License Python 3.10+ Apple Silicon Gitee mirror


中文 | English

简介

Flyto MLX 是面向中国 Mac 用户国产模型生态优化的 Apple Silicon 本地 LLM 服务器,基于 @jundot/oMLX fork。在保留 oMLX 全部上游能力(OpenAI 兼容 API、多模型 LRU 调度、KV 分页缓存、Mac menubar GUI)的基础上,加入了上游尚未合并/未支持的功能:

能力 说明
Gemma 4 audio chat OpenAI input_audio content type 端到端支持,调用 gemma4-e2b / gemma4-e4b 直接听音频回答(不是 ASR 替代,是端到端 audio understanding)
DFlash 双引擎 (Path A) Qwen / Gemma 4 双 backend,drafter co-loaded 优化
Tahoe 兼容 macOS 26 NSStatusItem occlusion bit 修复
上游已修但未发版的 backport tokenizer lm_head、TokenBuffer cache hit seed、health-check Session 复用 等 5 处
中文模型预设 Qwen 3.5 MoE/Dense / DeepSeek V4 / Gemma 4 / 等 alias 即装即用
Gitee 镜像 + ModelScope 模型源 国内 access 优化

安装

# pip
pip install flyto-mlx

# 启动 server(CLI 兼容上游 omlx,主名为 fmlx)
fmlx serve --port 8000
# 或
omlx serve --port 8000     # alias,与上游兼容

DMG / brew tap 后续随 release 提供。

快速试 audio chat

# 假设 server 已起在 :8000,API key 设为 mykey
python3 <<'PY'
import base64, requests, json
with open("recording.wav","rb") as f:
    b64 = base64.b64encode(f.read()).decode()
r = requests.post(
    "http://localhost:8000/v1/chat/completions",
    headers={"Authorization": "Bearer mykey"},
    json={
        "model": "gemma4-e2b",
        "max_tokens": 400,
        "temperature": 0.3,
        "messages": [{"role": "user", "content": [
            {"type": "text", "text": "总结这段电话的关键信息"},
            {"type": "input_audio", "input_audio": {"data": b64, "format": "wav"}}
        ]}]
    },
)
print(r.json()["choices"][0]["message"]["content"])
PY

跟上游 oMLX 的关系

Flyto MLX 是 oMLX 的下游 fork,遵循 Apache 2.0。我们定期 cherry-pick 上游 bug fix 与新模型支持,但不再向上游 PR 自家 feature(audio chat、DFlash 等)。如果你只想要纯上游体验,请用 @jundot/oMLX

详细 attribution 与版权声明见 NOTICELICENSE

License

Apache License 2.0. Based on oMLX by @jundot. 详见 LICENSENOTICE


English

Flyto MLX is a fork of @jundot/oMLX optimized for the Chinese Mac LLM community and sovereign-AI model ecosystem (Qwen, DeepSeek, Gemma 4). It preserves all upstream oMLX capabilities (OpenAI-compatible API, multi-model LRU scheduling, KV paged cache, menubar GUI) and adds:

  • Audio chat via OpenAI input_audio — end-to-end Gemma 4 nano audio LLM through /v1/chat/completions
  • DFlash Path A double-engine — Qwen and Gemma 4 backends with optimized drafter co-loading
  • macOS 26 Tahoe compatibility — NSStatusItem occlusion bit fix
  • 5 upstream-fixed-but-unreleased patches backported — tokenizer lm_head, TokenBuffer cache hit seed, health-check session reuse, and more
  • Chinese model presets — Qwen 3.5 MoE/Dense, DeepSeek V4, Gemma 4 aliases ready out of the box
  • Gitee mirror + ModelScope model registry — for users in mainland China

Install: pip install flyto-mlx. CLI: fmlx serve (or omlx serve alias for upstream compatibility).

We periodically cherry-pick upstream fixes. We do not upstream our own features back. For pure upstream behaviour, please use @jundot/oMLX directly.

License

Apache 2.0. See LICENSE and NOTICE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flyto_mlx-0.4.0.tar.gz (30.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flyto_mlx-0.4.0-py3-none-any.whl (30.0 MB view details)

Uploaded Python 3

File details

Details for the file flyto_mlx-0.4.0.tar.gz.

File metadata

  • Download URL: flyto_mlx-0.4.0.tar.gz
  • Upload date:
  • Size: 30.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for flyto_mlx-0.4.0.tar.gz
Algorithm Hash digest
SHA256 91eecf09011f0bb969f561c3f66af4269e014f71cf3d96f5a74a07095d2f07a5
MD5 0a7318715132d3fdebca3aaea4632efd
BLAKE2b-256 e058dc602541a4002f517290e699528fe931919b671b19cc1cd2ebe987a097a3

See more details on using hashes here.

File details

Details for the file flyto_mlx-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: flyto_mlx-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 30.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for flyto_mlx-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 634245a285ca7aa1addad9505b5ab23a58a0a8b02daf0d7638ef37b251313005
MD5 3010861190f3d911ab8aef98acd11f3d
BLAKE2b-256 1adfbc7a9a89ddce54ac6550c05b8e829cc15b41cab796ceb5a8a7aba0a1fa9a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page