mlx-gui

A lightweight RESTful wrapper around Apple's MLX engine for dynamically loading and serving MLX-compatible models

These details have not been verified by PyPI

Project links

Project description

███╗   ███╗██╗     ██╗  ██╗      ██████╗ ██╗   ██╗██╗
████╗ ████║██║     ╚██╗██╔╝     ██╔════╝ ██║   ██║██║
██╔████╔██║██║      ╚███╔╝█████╗██║  ███╗██║   ██║██║
██║╚██╔╝██║██║      ██╔██╗╚════╝██║   ██║██║   ██║██║
██║ ╚═╝ ██║███████╗██╔╝ ██╗     ╚██████╔╝╚██████╔╝██║
╚═╝     ╚═╝╚══════╝╚═╝  ╚═╝      ╚═════╝  ╚═════╝ ╚═╝

A lightweight Inference Server for Apple's MLX engine with a GUI.

TLDR - OpenRouter-style v1 API interface for MLX with Ollama-like model management, featuring auto-queuing, on-demand model loading, and multi-user serving capabilities via single mac app.

📦 Latest Release

v1.2.3 - Real-Time Model Status & Model Support (July 19 2025)

🚀 Real-Time Status Monitoring

✅ Live Model Status - Added real-time status tracking for model loading, including download progress.
📊 Detailed Status View - See download percentage, speed, and ETA directly in the UI.
🐛 Fixed Status Endpoint - Resolved a critical bug causing the server to crash when checking model status.
🖼️ See it in action:

🧪 New API Test Console

✅ Built-in API Testing - Added dedicated API Test tab in the admin interface for single-turn testing
🎯 Model Selection - Test any loaded model with customizable parameters (temperature, max tokens, system messages)
📊 Response Analytics - View response time, token count, and detailed statistics
📝 Test History - Keep track of recent API tests with timestamps and performance metrics
⚡ Quick Validation - Perfect for testing model responses and API functionality 🚀 Comprehensive Model Ecosystem
✅ 15+ New Verified Models - Added support for trending MLX models including SmolLM3, Kimi-K2, Gemma-3n, and more
🧠 Trillion-Parameter Support - Added detection for ultra-large models like Kimi-K2-Instruct (1.02T parameters)
🎯 Enhanced Model Discovery - Improved trending models endpoint with curated high-performance models
🔍 Smart Multimodal Detection - Fixed classification for models like Gemma-3n to properly show as "Multimodal"

🎨 New Verified Tested Models

SmolLM3-3B-4bit - Multilingual 481M parameter model with 8-language support
Kimi-Dev-72B-4bit-DWQ - Large reasoning model with advanced capabilities
Kimi-K2-Instruct-4bit - Ultra-large 1.02T parameter instruction-tuned model
Llama-3.2-3B-Instruct-4bit - Meta's instruction-following model with 502M parameters
Gemma-2-9B-it-4bit - Google's advanced reasoning model with 1.44B parameters
Qwen3-30B-A3B-4bit-DWQ - MoE model with 30B total/3B active parameters
Gemma-3n-E4B-it-MLX-4bit - Advanced multimodal model with image/audio/text capabilities

🔧 Technical Improvements

🎯 Improved Model Type Classification - Enhanced detection for multimodal models with image-text-to-text capabilities
📊 Expanded Parameter Patterns - Added support for trillion-scale model memory estimation
🧪 Comprehensive Test Suite - Added dedicated test scripts for all new models with streaming/non-streaming validation
🔄 Install-Load Workflow - Updated all tests to follow proper MLX-GUI model lifecycle (install → load → use)

v1.2.2 - Complete CyberAI Image Compatibility (July 15 2025)

🔧 Critical CyberAI Fix

✅ Raw Base64 Image Support - Fixed CyberAI images by adding support for raw base64 data (no data:image/ prefix)
🔍 Automatic Format Detection - Detects PNG, JPEG, GIF, and WebP from binary headers
🛠️ Enhanced Image Processing - Improved raw base64 validation and error handling
✅ Verified Fix - Tested and confirmed working with actual CyberAI client requests

v1.2.1 - Enhanced Image Compatibility (July 15 2025)

🖼️ Improved Vision Model Compatibility

✅ Enhanced Image Format Support - Fixed vision models not seeing images from certain OpenAI-compatible clients
🔧 Multiple Image URL Formats - Now supports various ways clients send images (image_url.url, image_url.image, direct image fields)
🤖 CyberAI Compatibility - Resolved image processing issues with CyberAI and other third-party clients
🛠️ Robust Image Parsing - Added fallback handling for different OpenAI API image formats
🎯 Better Error Handling - Improved debugging and error messages for image processing failures

v1.2.0 - Advanced Memory Management & VLM Stability (July 13 2025)

🧠 Revolutionary Memory Management

✅ Intelligent Auto-Unload System - Automatically unloads oldest models when memory limits are reached
🔄 Three-Layer Memory Protection - Proactive cleanup, concurrent limits, and emergency memory recovery
⚡ Memory Error Detection - Detects MLX memory errors and retries with automatic model unloading
📊 Smart LRU Eviction - Least Recently Used models are automatically freed to make room for new ones, this is 🪟TRANSPARENT to the users.
🛡️ Memory Overload Recovery - Up to 3 retry attempts with intelligent memory cleanup between attempts
⚡️ Core Updates - Updated to MLX-LM 0.26.0 and MLX-VLM 0.3.1

🔧 Enhanced VLM Stability

🖼️ Fixed Vision Model Queue Issues - Resolved concurrent loading problems with MLX-VLM models
🎯 Improved Image Token Handling - Better processing of vision inputs in queue system
🔄 Robust Multimodal Support - Enhanced stability for Gemma-3n, Qwen2-VL, and LLaVA models
📸 Optimized Memory Usage - Better memory management for large vision models

🚀 Performance Improvements

⚡ Faster Model Loading - Optimized queue processing with better error handling
🏗️ Enhanced Concurrent Processing - Improved handling of multiple simultaneous requests
📈 Better Resource Utilization - Smarter memory allocation and cleanup strategies
🔍 Comprehensive Testing - Added memory overload tests and queue verification

🛠️ Technical Enhancements

🧪 Advanced Testing Suite - New memory management tests and VLM stability verification
📝 Improved Logging - Better visibility into memory management and model lifecycle
🔧 Enhanced Error Recovery - More robust handling of edge cases and memory constraints

Download: Latest Release

Why ?

✅ Why MLX? Llama.cpp and Ollama are great, but they are slower than MLX. MLX is a native Apple Silicon framework that is optimized for Apple Silicon. Plus, it's free and open source, and this have a nice GUI.
⚡️ I wanted to turn my mac Mini and a Studio into more useful multiuser inference servers that I don't want to manage.
🏗️ I just want to build AI things and not manage inference servers, or pay for expensive services while maintaining sovereignty of my data.

GUI


Mac Native

🚀 Features

🧠 MLX Engine Integration - Native Apple Silicon acceleration via MLX
🔄 Intelligent Memory Management - Advanced auto-unload system with LRU eviction and memory error recovery
🛡️ Three-Layer Memory Protection - Proactive cleanup, concurrent limits, and emergency memory recovery
🌐 REST API Server - Complete API for model management and inference
🎨 Beautiful Admin Interface - Modern web GUI for model management
📊 System Monitoring - Real-time memory usage and system status with memory warnings
🔍 HuggingFace Integration - Discover and install MLX-compatible models
🎙️ Audio Support - Speech-to-text with Whisper and Parakeet models
🖼️ Vision Models - Image understanding with Gemma-3n, Qwen2-VL, LLaVA models (enhanced stability)
🔢 Embeddings Support - Text embeddings with OpenAI-compatible API
🍎 macOS System Tray - Native menu bar integration
⚡ OpenAI Compatibility - Drop-in replacement for OpenAI API
📱 Standalone App - Packaged macOS app bundle (no Python required)

📋 Requirements

macOS (Apple Silicon M1/M2/M3/M4 required)
Python 3.11+ (for development)
8GB+ RAM (16GB+ recommended for larger models)

🏃‍♂️ Quick Start

Option 1: Download Standalone App

Download the latest .app from Releases
Drag to /Applications
Launch - no Python installation required!
From the menu bar, click the MLX icon to open the admin interface.
Discover and install models from HuggingFace.
Connect your AI app to the API endpoint.

📝 Models may take a few minutes to load. They are gigabytes in size and will download at your internet speed.

Option 2: Install from Source

# Clone the repository
git clone https://github.com/RamboRogers/mlx-gui.git
cd mlx-gui

# Install dependencies
pip install -e ".[app]"

# Launch with system tray
mlx-gui tray

# Or launch server only
mlx-gui start --port 8000

🎮 Usage

An API Endpoint for Jan or any other AI app

Simply configure the API endpoint in the app settings to point to your MLX-GUI server. This works with any AI app that supports the OpenAI API. Enter anything for the API key.

System Tray (Recommended)

Launch the app and look for MLX in your menu bar:

Open Admin Interface - Web GUI for model management
System Status - Real-time monitoring
Unload All Models - Free up memory
Network Settings - Configure binding options

Web Admin Interface

Navigate to http://localhost:8000/admin for:

🔍 Discover Tab - Browse and install MLX models from HuggingFace
🧠 Models Tab - Manage installed models (load/unload/remove)
📊 Monitor Tab - System statistics and performance
⚙️ Settings Tab - Configure server and model options

API Usage

OpenAI-Compatible Chat

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-8b-6bit",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 100
  }'

Vision Models with Images

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2-vl-2b-instruct",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "What do you see in this image?"},
        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,/9j/4AAQ..."}}
      ]
    }],
    "max_tokens": 200
  }'

Audio Transcription

curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -H "Content-Type: multipart/form-data" \
  -F "file=@audio.wav" \
  -F "model=parakeet-tdt-0-6b-v2"

Text Embeddings

curl -X POST http://localhost:8000/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": ["Hello world", "How are you?"],
    "model": "qwen3-embedding-0-6b-4bit"
  }'

Install Models

# Install text model
curl -X POST http://localhost:8000/v1/models/install \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "mlx-community/Qwen2.5-7B-Instruct-4bit",
    "name": "qwen-7b-4bit"
  }'

# Install audio model
curl -X POST http://localhost:8000/v1/models/install \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "mlx-community/parakeet-tdt-0.6b-v2",
    "name": "parakeet-tdt-0-6b-v2"
  }'

# Install vision model
curl -X POST http://localhost:8000/v1/models/install \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "mlx-community/Qwen2-VL-2B-Instruct-4bit",
    "name": "qwen2-vl-2b-instruct"
  }'

# Install embedding model
curl -X POST http://localhost:8000/v1/models/install \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "mlx-community/Qwen3-Embedding-0.6B-4bit-DWQ",
    "name": "qwen3-embedding-0-6b-4bit"
  }'

🏗️ Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│  System Tray    │    │   Web Admin GUI  │    │   REST API      │
│  (macOS)        │◄──►│  (localhost:8000)│◄──►│  (/v1/*)        │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                                                         │
                       ┌─────────────────┐              │
                       │ Model Manager   │◄─────────────┘
                       │ (Queue/Memory)  │
                       └─────────────────┘
                                │
                       ┌─────────────────┐
                       │  MLX Engine     │
                       │ (Apple Silicon) │
                       └─────────────────┘

📚 API Documentation

Full API documentation is available at /v1/docs when the server is running, or see API.md for complete endpoint reference.

Key Endpoints

GET /v1/models - List installed models
POST /v1/models/install - Install from HuggingFace
POST /v1/models/{name}/load - Load model into memory
POST /v1/chat/completions - OpenAI-compatible chat (text + images)
POST /v1/embeddings - Generate text embeddings
POST /v1/audio/transcriptions - Audio transcription (Whisper/Parakeet)
GET /v1/discover/models - Search HuggingFace for MLX models
GET /v1/discover/embeddings - Search for embedding models
GET /v1/system/status - System and memory status

🛠️ Development

Setup Development Environment

git clone https://github.com/RamboRogers/mlx-gui.git
cd mlx-gui

# Create virtual environment
python -m venv .venv
source .venv/bin/activate

# Install in development mode with audio and vision support
pip install -e ".[dev,audio,vision]"

# Run tests
pytest

# Start development server
mlx-gui start --reload

Build Standalone App

# Install build dependencies with audio and vision support
pip install rumps pyinstaller mlx-whisper parakeet-mlx mlx-vlm

# Build macOS app bundle
./build_app.sh

# Result: dist/MLX-GUI.app

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

⚖️ License

MLX-GUI is licensed under the GNU General Public License v3.0 (GPLv3).
Free Software

Connect With Me 🤝

🙏 Acknowledgments

Apple MLX Team - For the incredible MLX framework
MLX-LM - MLX language model implementations
HuggingFace - For the model hub and transformers library

⭐ Star History

Made with ❤️ for the Apple Silicon community

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.3.0

Dec 16, 2025

1.2.4

Jul 22, 2025

This version

1.2.3

Jul 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_gui-1.2.3.tar.gz (101.9 kB view details)

Uploaded Jul 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlx_gui-1.2.3-py3-none-any.whl (109.1 kB view details)

Uploaded Jul 20, 2025 Python 3

File details

Details for the file mlx_gui-1.2.3.tar.gz.

File metadata

Download URL: mlx_gui-1.2.3.tar.gz
Upload date: Jul 20, 2025
Size: 101.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for mlx_gui-1.2.3.tar.gz
Algorithm	Hash digest
SHA256	`66e62836178d766831cfe25f669fd1174e3f76195ec27bea463576d832fdf8df`
MD5	`39c07e4780a3feab430dd6702798d12b`
BLAKE2b-256	`56e6fea4de469252bc32a12db9d0aba43cb25b5fd310f617f3f617b02ca0f7ac`

See more details on using hashes here.

File details

Details for the file mlx_gui-1.2.3-py3-none-any.whl.

File metadata

Download URL: mlx_gui-1.2.3-py3-none-any.whl
Upload date: Jul 20, 2025
Size: 109.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for mlx_gui-1.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c6fb0fcf6c889c3a11ff54e22c2902c9dce9d3674b6490a32ec742e097376f65`
MD5	`17a5cd5096c9d515a71410ffcbd7cae7`
BLAKE2b-256	`b50ea9ce85d59fd481715cb2198f3df32c580fbd68377e23c4633318808a10f6`

See more details on using hashes here.

mlx-gui 1.2.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

📦 Latest Release

v1.2.3 - Real-Time Model Status & Model Support (July 19 2025)

v1.2.2 - Complete CyberAI Image Compatibility (July 15 2025)

v1.2.1 - Enhanced Image Compatibility (July 15 2025)

v1.2.0 - Advanced Memory Management & VLM Stability (July 13 2025)

Why ?

🚀 Features

📋 Requirements

🏃‍♂️ Quick Start

Option 1: Download Standalone App

Option 2: Install from Source

🎮 Usage

An API Endpoint for Jan or any other AI app

System Tray (Recommended)

Web Admin Interface

API Usage

OpenAI-Compatible Chat

Vision Models with Images

Audio Transcription

Text Embeddings

Install Models

🏗️ Architecture

📚 API Documentation

Key Endpoints

🛠️ Development

Setup Development Environment

Build Standalone App

🤝 Contributing

⚖️ License

Connect With Me 🤝

🙏 Acknowledgments

⭐ Star History

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes