Skip to main content

MLX Omni Server is a server that provides OpenAI-compatible APIs using Apple's MLX framework.

Project description

MLX Omni Server

Local AI inference server optimized for Apple Silicon

PyPI version Python 3.11+ License: MIT Ask DeepWiki

MLX Omni Server Banner

MLX Omni Server provides dual API compatibility with both OpenAI and Anthropic APIs, enabling seamless local inference on Apple Silicon using the MLX framework.

InstallationQuick StartDocumentationContributing

✨ Features

  • 🚀 Apple Silicon Optimized - Built on MLX framework for M1/M2/M3/M4 chips
  • 🔌 Dual API Support - Compatible with both OpenAI and Anthropic APIs
  • 🎯 Complete AI Suite - Chat, audio processing, image generation, embeddings
  • High Performance - Local inference with hardware acceleration
  • 🔐 Privacy-First - All processing happens locally on your machine
  • 🛠 Drop-in Replacement - Works with existing OpenAI and Anthropic SDKs

🚀 Installation

pip install mlx-omni-server

⚡ Quick Start

  1. Start the server:

    mlx-omni-server
    
  2. Choose your preferred API:

    OpenAI API (Click to expand)
    from openai import OpenAI
    
    client = OpenAI(
        base_url="http://localhost:10240/v1",
        api_key="not-needed"
    )
    
    response = client.chat.completions.create(
        model="mlx-community/gemma-3-1b-it-4bit-DWQ",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(response.choices[0].message.content)
    
    Anthropic API (Click to expand)
    import anthropic
    
    client = anthropic.Anthropic(
        base_url="http://localhost:10240/anthropic",
        api_key="not-needed"
    )
    
    message = client.messages.create(
        model="mlx-community/gemma-3-1b-it-4bit-DWQ",
        max_tokens=1000,
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(message.content[0].text)
    

🎉 That's it! You're now running AI locally on your Mac.

📋 API Support

OpenAI Compatible Endpoints (/v1/*)

Endpoint Feature Status
/v1/chat/completions Chat with tools, streaming, structured output
/v1/audio/speech Text-to-Speech
/v1/audio/transcriptions Speech-to-Text
/v1/images/generations Image Generation
/v1/embeddings Text Embeddings
/v1/models Model Management

Anthropic Compatible Endpoints (/anthropic/v1/*)

Endpoint Feature Status
/anthropic/v1/messages Messages with tools, streaming, thinking mode
/anthropic/v1/models Model listing with pagination

⚙️ Configuration

# Default (port 10240)
mlx-omni-server

# Custom options
mlx-omni-server --port 8000
MLX_OMNI_LOG_LEVEL=debug mlx-omni-server

# View all options
mlx-omni-server --help

🛠 Development

Development Setup
git clone https://github.com/madroidmaq/mlx-omni-server.git
cd mlx-omni-server
uv sync

# Start with hot-reload
uv run uvicorn mlx_omni_server.main:app --reload --host 0.0.0.0 --port 10240

Testing:

uv run pytest                    # All tests
uv run pytest tests/chat/openai/ # OpenAI tests
uv run pytest tests/chat/anthropic/ # Anthropic tests

Code Quality:

uv run black . && uv run isort . # Format code
uv run pre-commit run --all-files # Run hooks

🎯 Key Features

Model Management

  • Auto-discovery of MLX models in HuggingFace cache
  • On-demand loading and intelligent caching
  • Automatic model downloading when needed

Advanced Capabilities

  • Function calling with model-specific parsers
  • Real-time streaming for both APIs
  • JSON schema validation and structured output
  • Extended reasoning (thinking mode) for supported models

📚 Documentation

Resource Description
OpenAI API Guide Complete OpenAI API reference
Anthropic API Guide Complete Anthropic API reference
Examples Practical usage examples

🔍 Troubleshooting

Common Issues

Requirements:

  • Python 3.11+
  • Apple Silicon Mac (M1/M2/M3/M4)
  • MLX framework installed

Quick fixes:

# Check requirements
python --version  # Should be 3.11+
python -c "import mlx; print(mlx.__version__)"

# Pre-download models (if needed)
huggingface-cli download mlx-community/gemma-3-1b-it-4bit-DWQ

# Enable debug logging
MLX_OMNI_LOG_LEVEL=debug mlx-omni-server

🤝 Contributing

Quick contributor setup:

git clone https://github.com/madroidmaq/mlx-omni-server.git
cd mlx-omni-server
uv sync && uv run pytest

🙏 Acknowledgments

Built with MLX by Apple • FastAPIMLX-LM

📄 License

MIT License • Not affiliated with OpenAI, Anthropic, or Apple

🌟 Star History

Star History Chart

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_omni_server-0.5.2.tar.gz (55.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlx_omni_server-0.5.2-py3-none-any.whl (78.2 kB view details)

Uploaded Python 3

File details

Details for the file mlx_omni_server-0.5.2.tar.gz.

File metadata

  • Download URL: mlx_omni_server-0.5.2.tar.gz
  • Upload date:
  • Size: 55.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.7

File hashes

Hashes for mlx_omni_server-0.5.2.tar.gz
Algorithm Hash digest
SHA256 7e621b469bfa8fb5c0fc7756bd29305d5bdb9173bcf747fef81a963909ebf7ba
MD5 56e128217cf3c2e7500232b19678b520
BLAKE2b-256 437aa172434c047a9383ae2ba655be4802d42d4796cc8435d4e768ae2add0890

See more details on using hashes here.

File details

Details for the file mlx_omni_server-0.5.2-py3-none-any.whl.

File metadata

File hashes

Hashes for mlx_omni_server-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f32fa660edc77bca90fdb2c42313fdbaa17618c78baef968cad868e8e23eb992
MD5 a59c22111c5b72e97c02591a0a572978
BLAKE2b-256 b07dd7d1eedd1417b3f6860d821fa9c6c1a1208dd209ea8da65d945a357109ad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page