Web-based MLX model manager for Apple Silicon Macs

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

MLX Manager

Run and serve local LLMs on your Mac with one command. MLX Manager provides a web UI for managing MLX-optimized models on Apple Silicon, with an embedded high-performance inference server exposing both OpenAI and Anthropic-compatible APIs.

Why MLX Manager?

Running local LLMs typically requires juggling multiple tools, config files, and terminal commands. Tools like Ollama and LM Studio make model management easier, but they rely on llama.cpp — a cross-platform C++ runtime that treats Apple Silicon as one target among many. MLX Manager takes a different approach: it includes a purpose-built inference server that calls the MLX framework directly, so every operation runs natively on Metal GPU without translation layers, format conversions, or cross-platform abstractions.

One-click model downloads from HuggingFace MLX models (mlx-community, lmstudio-community, and more)
Smart model discovery - filter by architecture (Llama, Qwen, Mistral), quantization (4-bit, 8-bit), and capabilities (multimodal, tool use)
Purpose-built inference server - direct MLX framework integration with OpenAI and Anthropic API compatibility, not a wrapper around llama.cpp
Multi-model, multi-type - load text, vision, embeddings, and audio models simultaneously with LRU eviction. Not one-model-at-a-time
Two-phase model lifecycle - models are probed at load time to discover capabilities (tool calling, thinking, streaming), then served with zero runtime overhead
Visual server management - start, stop, and monitor models with real-time CPU/memory metrics
Rich chat interface - test models with image/video/text attachments, thinking model support, and MCP tool integration
Cloud routing - seamlessly route requests to OpenAI/Anthropic APIs when local models can't handle them
User authentication - secure multi-user access with JWT auth and admin controls
Background service - models auto-start on login via macOS launchd
Menubar app - quick access from your Mac's status bar

Quick Start

Install

# Homebrew (recommended)
brew tap tumma72/mlx-manager https://github.com/tumma72/mlx-manager
brew install mlx-manager

# Or via pip
pip install mlx-manager

Run

mlx-manager serve

Open http://localhost:10242 and you're ready to:

Register - Create your account (first user becomes admin)
Browse - Search HuggingFace for MLX-optimized models
Filter - Find models by architecture, quantization, or capabilities
Download - One-click download with progress tracking
Configure - Create a server profile with custom settings
Run - Start serving and chat with your model

Use as an API

Once a model is loaded, use it with any OpenAI or Anthropic client:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:10242/v1", api_key="not-needed")
response = client.chat.completions.create(
    model="mlx-community/Llama-3.2-3B-Instruct-4bit",
    messages=[{"role": "user", "content": "Hello!"}],
)

import anthropic

client = anthropic.Anthropic(base_url="http://localhost:10242/v1", api_key="not-needed")
message = client.messages.create(
    model="mlx-community/Llama-3.2-3B-Instruct-4bit",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)

Embedded Inference Server

MLX Manager includes a fully self-contained inference server mounted at /v1. No external inference backends needed — the server calls mlx-lm, mlx-vlm, mlx-embeddings, and mlx-audio directly, keeping everything on the Metal GPU without process boundaries or IPC overhead.

Why build our own instead of wrapping Ollama or llama.cpp? Those projects target every platform and GPU vendor, which means abstraction layers, GGUF format conversions, and lowest-common-denominator threading. MLX Manager's server is Apple Silicon-only by design: it uses a persistent Metal GPU thread with a job queue for thread affinity, list-buffer string assembly to avoid O(n^2) concatenation in streaming, and a two-phase probe-then-serve lifecycle that eliminates capability checks from the hot path. The result is an inference server that behaves like a native macOS application — because it is one.

Multi-Protocol API

Protocol	Endpoint	Description
OpenAI	`POST /v1/chat/completions`	Chat completions (streaming + non-streaming)
OpenAI	`POST /v1/completions`	Legacy text completions
OpenAI	`POST /v1/embeddings`	Text embeddings
OpenAI	`POST /v1/audio/speech`	Text-to-speech
OpenAI	`POST /v1/audio/transcriptions`	Speech-to-text
Anthropic	`POST /v1/messages`	Anthropic Messages API
Both	`GET /v1/models`	List available models

Key Capabilities

Unified adapter architecture - single ModelAdapter handles all model types with data-driven family configs (no subclass explosion)
8 model families - Qwen, GLM-4, Llama, Gemma, Mistral/Devstral/Magistral, Liquid, Whisper, Kokoro
Smart model detection - auto-detects model type, family, and capabilities from config.json
Two-phase probe-then-serve - discovers tool-calling format, thinking delimiters, and streaming support at load time; hot path runs with zero capability checks
Persistent Metal GPU thread - dedicated thread with job queue ensures Metal thread affinity across all requests
Multi-model pool with LRU eviction - host text, vision, embeddings, and audio models simultaneously; auto-evict when memory pressure rises
Multi-protocol from one server - both OpenAI and Anthropic APIs from the same endpoint with bidirectional protocol translation
Continuous batching (experimental) with prefix caching and priority scheduling
Performance-optimized streaming - list-buffer string assembly (no O(n^2) concatenation), clean shutdown with drain timeout
Structured output - JSON mode with schema validation
Cloud routing - route to OpenAI/Anthropic cloud APIs when local models can't handle the request

Observability

Audit logging - privacy-first request metadata logging with WebSocket live streaming
Prometheus metrics - request latency, throughput, model memory, pool cache hits/misses
LogFire integration - distributed tracing with Pydantic LogFire
RFC 7807 errors - structured error responses with request ID correlation

See docs/MLX_SERVER.md for the full configuration reference, security guide, metrics list, and API documentation.

Features

Model Discovery

Browse and filter models with rich metadata:

Architecture badges - Llama, Qwen, Mistral, Gemma, Phi, and more
Quantization info - 4-bit, 8-bit quantization levels
Capability detection - Multimodal (vision), tool use support
Toggle view - Switch between your downloaded models and HuggingFace search

User Management

Secure multi-user support:

JWT authentication - Secure token-based auth
Admin controls - Approve/disable users, manage permissions
First-user admin - Initial user automatically becomes administrator
Rate limiting - Per-IP request throttling with token bucket algorithm

Server Monitoring

Real-time server metrics:

Memory usage and CPU/GPU utilization
Server uptime tracking
One-click start/stop/restart controls

Chat Interface

Rich conversation experience:

Multimodal support - Attach images, videos, and text files via drag-drop or button
Thinking models - Collapsible thinking panel for reasoning models (Qwen3, GLM-4, DeepSeek)
MCP tools - Built-in calculator and weather tools for testing tool-use models
System prompts - Configure default context per server profile

System Requirements

macOS 13+ with Apple Silicon (M1/M2/M3/M4)
Python 3.11 or 3.12
8GB+ RAM (16GB+ recommended for larger models)

Commands

mlx-manager serve            # Start the web server
mlx-manager menubar          # Launch menubar app
mlx-manager install-service  # Auto-start on login
mlx-manager status           # Show running servers

Configuration

Environment variables (all optional):

Variable	Default	Description
`MLX_MANAGER_DATABASE_PATH`	`~/.mlx-manager/mlx-manager.db`	Database location
`MLX_MANAGER_DEFAULT_PORT_START`	`10240`	Starting port for servers
`MLX_MANAGER_JWT_SECRET`	Auto-generated	JWT signing secret

MLX Server Configuration

The embedded MLX inference server accepts MLX_SERVER_* environment variables. All settings are opt-in with safe defaults -- zero configuration needed for local use.

Variable	Default	Description
`MLX_SERVER_ADMIN_TOKEN`	none	Bearer token for `/v1/admin/*` endpoints
`MLX_SERVER_RATE_LIMIT_RPM`	`0` (off)	Requests per minute per IP
`MLX_SERVER_METRICS_ENABLED`	`false`	Enable Prometheus metrics at `/v1/admin/metrics`
`MLX_SERVER_MAX_MEMORY_GB`	`0` (auto)	Model pool memory limit (0 = 75% of device RAM)
`MLX_SERVER_MAX_MODELS`	`4`	Max models loaded simultaneously
`MLX_SERVER_TIMEOUT_CHAT_SECONDS`	`900`	Chat completions timeout
`MLX_SERVER_DRAIN_TIMEOUT_SECONDS`	`30`	Graceful shutdown drain timeout

See docs/MLX_SERVER.md for the full configuration reference, security guide, metrics list, and API documentation.

Development

git clone https://github.com/tumma72/mlx-manager.git
cd mlx-manager
make install-dev  # Install dependencies
make dev          # Start dev servers
make test         # Run tests (4500+ tests)

License

MIT

Acknowledgments

Built on MLX, mlx-lm, mlx-vlm, mlx-embeddings, and mlx-audio.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

tumma72

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.2.12

Mar 11, 2026

1.2.11

Mar 10, 2026

1.2.10

Mar 10, 2026

1.2.9

Mar 9, 2026

1.2.8

Mar 9, 2026

This version

1.2.7

Mar 9, 2026

1.2.6

Mar 9, 2026

1.2.5

Mar 9, 2026

1.2.4

Mar 8, 2026

1.2.3

Mar 7, 2026

1.2.2

Mar 6, 2026

1.2.1

Mar 5, 2026

1.2.0

Mar 5, 2026

1.1.0

Jan 26, 2026

1.0.4

Jan 20, 2026

1.0.3

Jan 19, 2026

1.0.2

Jan 15, 2026

1.0.1

Jan 14, 2026

1.0.0

Jan 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_manager-1.2.7.tar.gz (611.5 kB view details)

Uploaded Mar 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlx_manager-1.2.7-py3-none-any.whl (565.3 kB view details)

Uploaded Mar 9, 2026 Python 3

File details

Details for the file mlx_manager-1.2.7.tar.gz.

File metadata

Download URL: mlx_manager-1.2.7.tar.gz
Upload date: Mar 9, 2026
Size: 611.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlx_manager-1.2.7.tar.gz
Algorithm	Hash digest
SHA256	`f7b5110ad1eae9f6ace4869cf2ee756bc1fc82dabe8aeab9b1ed4fc0f8055d2c`
MD5	`521e10d48fee290606e2cf3527d675ab`
BLAKE2b-256	`beaa3be02cc6b167d1e0fc64b04ff5cc08dc1ebc2c106a979649c120c09fb0cd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlx_manager-1.2.7.tar.gz:

Publisher: deploy_to_pypi.yml on tumma72/mlx-manager

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mlx_manager-1.2.7.tar.gz
- Subject digest: f7b5110ad1eae9f6ace4869cf2ee756bc1fc82dabe8aeab9b1ed4fc0f8055d2c
- Sigstore transparency entry: 1066595491
- Sigstore integration time: Mar 9, 2026
Source repository:
- Permalink: tumma72/mlx-manager@f908cc185c3cb0a3f544e8ac47eb113af08bfc7f
- Branch / Tag: refs/tags/v1.2.7
- Owner: https://github.com/tumma72
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: deploy_to_pypi.yml@f908cc185c3cb0a3f544e8ac47eb113af08bfc7f
- Trigger Event: push

File details

Details for the file mlx_manager-1.2.7-py3-none-any.whl.

File metadata

Download URL: mlx_manager-1.2.7-py3-none-any.whl
Upload date: Mar 9, 2026
Size: 565.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlx_manager-1.2.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0580ec4bf562e2b34f7ba7f9dd70cf4e0fe6d50ff387aea1189ae9849a6d8cfc`
MD5	`eb8334be6a972a72a85ce2be0dde44f7`
BLAKE2b-256	`26d755a9477138629b45869084f836738711fbd2296457ef0e018762989a90e2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlx_manager-1.2.7-py3-none-any.whl:

Publisher: deploy_to_pypi.yml on tumma72/mlx-manager

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mlx_manager-1.2.7-py3-none-any.whl
- Subject digest: 0580ec4bf562e2b34f7ba7f9dd70cf4e0fe6d50ff387aea1189ae9849a6d8cfc
- Sigstore transparency entry: 1066595496
- Sigstore integration time: Mar 9, 2026
Source repository:
- Permalink: tumma72/mlx-manager@f908cc185c3cb0a3f544e8ac47eb113af08bfc7f
- Branch / Tag: refs/tags/v1.2.7
- Owner: https://github.com/tumma72
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: deploy_to_pypi.yml@f908cc185c3cb0a3f544e8ac47eb113af08bfc7f
- Trigger Event: push

mlx-manager 1.2.7

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

MLX Manager

Why MLX Manager?

Quick Start

Install

Run

Use as an API

Embedded Inference Server

Multi-Protocol API

Key Capabilities

Observability

Features

Model Discovery

User Management

Server Monitoring

Chat Interface

System Requirements

Commands

Configuration

MLX Server Configuration

Development

License

Acknowledgments

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance