A lightweight RESTful wrapper around Apple's MLX engine for dynamically loading and serving MLX-compatible models
Project description
███╗ ███╗██╗ ██╗ ██╗ ██████╗ ██╗ ██╗██╗
████╗ ████║██║ ╚██╗██╔╝ ██╔════╝ ██║ ██║██║
██╔████╔██║██║ ╚███╔╝█████╗██║ ███╗██║ ██║██║
██║╚██╔╝██║██║ ██╔██╗╚════╝██║ ██║██║ ██║██║
██║ ╚═╝ ██║███████╗██╔╝ ██╗ ╚██████╔╝╚██████╔╝██║
╚═╝ ╚═╝╚══════╝╚═╝ ╚═╝ ╚═════╝ ╚═════╝ ╚═╝
A lightweight Inference Server for Apple's MLX engine with a GUI.
TLDR - OpenRouter-style v1 API interface for MLX with Ollama-like model management, featuring auto-queuing, on-demand model loading, and multi-user serving capabilities via single mac app.
📦 Latest Release
v1.2.3 - Real-Time Model Status & Model Support (July 19 2025)
🚀 Real-Time Status Monitoring
- ✅ Live Model Status - Added real-time status tracking for model loading, including download progress.
- 📊 Detailed Status View - See download percentage, speed, and ETA directly in the UI.
- 🐛 Fixed Status Endpoint - Resolved a critical bug causing the server to crash when checking model status.
- 🖼️ See it in action:
🧪 New API Test Console
- ✅ Built-in API Testing - Added dedicated API Test tab in the admin interface for single-turn testing
- 🎯 Model Selection - Test any loaded model with customizable parameters (temperature, max tokens, system messages)
- 📊 Response Analytics - View response time, token count, and detailed statistics
- 📝 Test History - Keep track of recent API tests with timestamps and performance metrics
- ⚡ Quick Validation - Perfect for testing model responses and API functionality 🚀 Comprehensive Model Ecosystem
- ✅ 15+ New Verified Models - Added support for trending MLX models including SmolLM3, Kimi-K2, Gemma-3n, and more
- 🧠 Trillion-Parameter Support - Added detection for ultra-large models like Kimi-K2-Instruct (1.02T parameters)
- 🎯 Enhanced Model Discovery - Improved trending models endpoint with curated high-performance models
- 🔍 Smart Multimodal Detection - Fixed classification for models like Gemma-3n to properly show as "Multimodal"
🎨 New Verified Tested Models
- SmolLM3-3B-4bit - Multilingual 481M parameter model with 8-language support
- Kimi-Dev-72B-4bit-DWQ - Large reasoning model with advanced capabilities
- Kimi-K2-Instruct-4bit - Ultra-large 1.02T parameter instruction-tuned model
- Llama-3.2-3B-Instruct-4bit - Meta's instruction-following model with 502M parameters
- Gemma-2-9B-it-4bit - Google's advanced reasoning model with 1.44B parameters
- Qwen3-30B-A3B-4bit-DWQ - MoE model with 30B total/3B active parameters
- Gemma-3n-E4B-it-MLX-4bit - Advanced multimodal model with image/audio/text capabilities
🔧 Technical Improvements
- 🎯 Improved Model Type Classification - Enhanced detection for multimodal models with image-text-to-text capabilities
- 📊 Expanded Parameter Patterns - Added support for trillion-scale model memory estimation
- 🧪 Comprehensive Test Suite - Added dedicated test scripts for all new models with streaming/non-streaming validation
- 🔄 Install-Load Workflow - Updated all tests to follow proper MLX-GUI model lifecycle (install → load → use)
v1.2.2 - Complete CyberAI Image Compatibility (July 15 2025)
🔧 Critical CyberAI Fix
- ✅ Raw Base64 Image Support - Fixed CyberAI images by adding support for raw base64 data (no
data:image/prefix) - 🔍 Automatic Format Detection - Detects PNG, JPEG, GIF, and WebP from binary headers
- 🛠️ Enhanced Image Processing - Improved raw base64 validation and error handling
- ✅ Verified Fix - Tested and confirmed working with actual CyberAI client requests
v1.2.1 - Enhanced Image Compatibility (July 15 2025)
🖼️ Improved Vision Model Compatibility
- ✅ Enhanced Image Format Support - Fixed vision models not seeing images from certain OpenAI-compatible clients
- 🔧 Multiple Image URL Formats - Now supports various ways clients send images (
image_url.url,image_url.image, directimagefields) - 🤖 CyberAI Compatibility - Resolved image processing issues with CyberAI and other third-party clients
- 🛠️ Robust Image Parsing - Added fallback handling for different OpenAI API image formats
- 🎯 Better Error Handling - Improved debugging and error messages for image processing failures
v1.2.0 - Advanced Memory Management & VLM Stability (July 13 2025)
🧠 Revolutionary Memory Management
- ✅ Intelligent Auto-Unload System - Automatically unloads oldest models when memory limits are reached
- 🔄 Three-Layer Memory Protection - Proactive cleanup, concurrent limits, and emergency memory recovery
- ⚡ Memory Error Detection - Detects MLX memory errors and retries with automatic model unloading
- 📊 Smart LRU Eviction - Least Recently Used models are automatically freed to make room for new ones, this is 🪟TRANSPARENT to the users.
- 🛡️ Memory Overload Recovery - Up to 3 retry attempts with intelligent memory cleanup between attempts
- ⚡️ Core Updates - Updated to MLX-LM 0.26.0 and MLX-VLM 0.3.1
🔧 Enhanced VLM Stability
- 🖼️ Fixed Vision Model Queue Issues - Resolved concurrent loading problems with MLX-VLM models
- 🎯 Improved Image Token Handling - Better processing of vision inputs in queue system
- 🔄 Robust Multimodal Support - Enhanced stability for Gemma-3n, Qwen2-VL, and LLaVA models
- 📸 Optimized Memory Usage - Better memory management for large vision models
🚀 Performance Improvements
- ⚡ Faster Model Loading - Optimized queue processing with better error handling
- 🏗️ Enhanced Concurrent Processing - Improved handling of multiple simultaneous requests
- 📈 Better Resource Utilization - Smarter memory allocation and cleanup strategies
- 🔍 Comprehensive Testing - Added memory overload tests and queue verification
🛠️ Technical Enhancements
- 🧪 Advanced Testing Suite - New memory management tests and VLM stability verification
- 📝 Improved Logging - Better visibility into memory management and model lifecycle
- 🔧 Enhanced Error Recovery - More robust handling of edge cases and memory constraints
Download: Latest Release
Why ?
-
✅ Why MLX? Llama.cpp and Ollama are great, but they are slower than MLX. MLX is a native Apple Silicon framework that is optimized for Apple Silicon. Plus, it's free and open source, and this have a nice GUI.
-
⚡️ I wanted to turn my mac Mini and a Studio into more useful multiuser inference servers that I don't want to manage.
-
🏗️ I just want to build AI things and not manage inference servers, or pay for expensive services while maintaining sovereignty of my data.
| GUI | |
|---|---|
| Mac Native | |
🚀 Features
- 🧠 MLX Engine Integration - Native Apple Silicon acceleration via MLX
- 🔄 Intelligent Memory Management - Advanced auto-unload system with LRU eviction and memory error recovery
- 🛡️ Three-Layer Memory Protection - Proactive cleanup, concurrent limits, and emergency memory recovery
- 🌐 REST API Server - Complete API for model management and inference
- 🎨 Beautiful Admin Interface - Modern web GUI for model management
- 📊 System Monitoring - Real-time memory usage and system status with memory warnings
- 🔍 HuggingFace Integration - Discover and install MLX-compatible models
- 🎙️ Audio Support - Speech-to-text with Whisper and Parakeet models
- 🖼️ Vision Models - Image understanding with Gemma-3n, Qwen2-VL, LLaVA models (enhanced stability)
- 🔢 Embeddings Support - Text embeddings with OpenAI-compatible API
- 🍎 macOS System Tray - Native menu bar integration
- ⚡ OpenAI Compatibility - Drop-in replacement for OpenAI API
- 📱 Standalone App - Packaged macOS app bundle (no Python required)
📋 Requirements
- macOS (Apple Silicon M1/M2/M3/M4 required)
- Python 3.11+ (for development)
- 8GB+ RAM (16GB+ recommended for larger models)
🏃♂️ Quick Start
Option 1: Download Standalone App
- Download the latest
.appfrom Releases - Drag to
/Applications - Launch - no Python installation required!
- From the menu bar, click the MLX icon to open the admin interface.
- Discover and install models from HuggingFace.
- Connect your AI app to the API endpoint.
📝 Models may take a few minutes to load. They are gigabytes in size and will download at your internet speed.
Option 2: Install from Source
# Clone the repository
git clone https://github.com/RamboRogers/mlx-gui.git
cd mlx-gui
# Install dependencies
pip install -e ".[app]"
# Launch with system tray
mlx-gui tray
# Or launch server only
mlx-gui start --port 8000
🎮 Usage
An API Endpoint for Jan or any other AI app
Simply configure the API endpoint in the app settings to point to your MLX-GUI server. This works with any AI app that supports the OpenAI API. Enter anything for the API key.
System Tray (Recommended)
Launch the app and look for MLX in your menu bar:
- Open Admin Interface - Web GUI for model management
- System Status - Real-time monitoring
- Unload All Models - Free up memory
- Network Settings - Configure binding options
Web Admin Interface
Navigate to http://localhost:8000/admin for:
- 🔍 Discover Tab - Browse and install MLX models from HuggingFace
- 🧠 Models Tab - Manage installed models (load/unload/remove)
- 📊 Monitor Tab - System statistics and performance
- ⚙️ Settings Tab - Configure server and model options
API Usage
OpenAI-Compatible Chat
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-8b-6bit",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 100
}'
Vision Models with Images
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2-vl-2b-instruct",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "What do you see in this image?"},
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,/9j/4AAQ..."}}
]
}],
"max_tokens": 200
}'
Audio Transcription
curl -X POST http://localhost:8000/v1/audio/transcriptions \
-H "Content-Type: multipart/form-data" \
-F "file=@audio.wav" \
-F "model=parakeet-tdt-0-6b-v2"
Text Embeddings
curl -X POST http://localhost:8000/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"input": ["Hello world", "How are you?"],
"model": "qwen3-embedding-0-6b-4bit"
}'
Install Models
# Install text model
curl -X POST http://localhost:8000/v1/models/install \
-H "Content-Type: application/json" \
-d '{
"model_id": "mlx-community/Qwen2.5-7B-Instruct-4bit",
"name": "qwen-7b-4bit"
}'
# Install audio model
curl -X POST http://localhost:8000/v1/models/install \
-H "Content-Type: application/json" \
-d '{
"model_id": "mlx-community/parakeet-tdt-0.6b-v2",
"name": "parakeet-tdt-0-6b-v2"
}'
# Install vision model
curl -X POST http://localhost:8000/v1/models/install \
-H "Content-Type: application/json" \
-d '{
"model_id": "mlx-community/Qwen2-VL-2B-Instruct-4bit",
"name": "qwen2-vl-2b-instruct"
}'
# Install embedding model
curl -X POST http://localhost:8000/v1/models/install \
-H "Content-Type: application/json" \
-d '{
"model_id": "mlx-community/Qwen3-Embedding-0.6B-4bit-DWQ",
"name": "qwen3-embedding-0-6b-4bit"
}'
🏗️ Architecture
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ System Tray │ │ Web Admin GUI │ │ REST API │
│ (macOS) │◄──►│ (localhost:8000)│◄──►│ (/v1/*) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
┌─────────────────┐ │
│ Model Manager │◄─────────────┘
│ (Queue/Memory) │
└─────────────────┘
│
┌─────────────────┐
│ MLX Engine │
│ (Apple Silicon) │
└─────────────────┘
📚 API Documentation
Full API documentation is available at /v1/docs when the server is running, or see API.md for complete endpoint reference.
Key Endpoints
GET /v1/models- List installed modelsPOST /v1/models/install- Install from HuggingFacePOST /v1/models/{name}/load- Load model into memoryPOST /v1/chat/completions- OpenAI-compatible chat (text + images)POST /v1/embeddings- Generate text embeddingsPOST /v1/audio/transcriptions- Audio transcription (Whisper/Parakeet)GET /v1/discover/models- Search HuggingFace for MLX modelsGET /v1/discover/embeddings- Search for embedding modelsGET /v1/system/status- System and memory status
🛠️ Development
Setup Development Environment
git clone https://github.com/RamboRogers/mlx-gui.git
cd mlx-gui
# Create virtual environment
python -m venv .venv
source .venv/bin/activate
# Install in development mode with audio and vision support
pip install -e ".[dev,audio,vision]"
# Run tests
pytest
# Start development server
mlx-gui start --reload
Build Standalone App
# Install build dependencies with audio and vision support
pip install rumps pyinstaller mlx-whisper parakeet-mlx mlx-vlm
# Build macOS app bundle
./build_app.sh
# Result: dist/MLX-GUI.app
🤝 Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
⚖️ License
MLX-GUI is licensed under the GNU General Public License v3.0 (GPLv3).
Free Software
Connect With Me 🤝
🙏 Acknowledgments
- Apple MLX Team - For the incredible MLX framework
- MLX-LM - MLX language model implementations
- HuggingFace - For the model hub and transformers library
⭐ Star History
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlx_gui-1.2.3.tar.gz.
File metadata
- Download URL: mlx_gui-1.2.3.tar.gz
- Upload date:
- Size: 101.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
66e62836178d766831cfe25f669fd1174e3f76195ec27bea463576d832fdf8df
|
|
| MD5 |
39c07e4780a3feab430dd6702798d12b
|
|
| BLAKE2b-256 |
56e6fea4de469252bc32a12db9d0aba43cb25b5fd310f617f3f617b02ca0f7ac
|
File details
Details for the file mlx_gui-1.2.3-py3-none-any.whl.
File metadata
- Download URL: mlx_gui-1.2.3-py3-none-any.whl
- Upload date:
- Size: 109.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c6fb0fcf6c889c3a11ff54e22c2902c9dce9d3674b6490a32ec742e097376f65
|
|
| MD5 |
17a5cd5096c9d515a71410ffcbd7cae7
|
|
| BLAKE2b-256 |
b50ea9ce85d59fd481715cb2198f3df32c580fbd68377e23c4633318808a10f6
|