Skip to main content

MLX Omni Server is a server that provides OpenAI-compatible APIs using Apple's MLX framework.

Project description

MLX Omni Server

image

alt text

MLX Omni Server is a local inference server powered by Apple's MLX framework, specifically designed for Apple Silicon (M-series) chips. It implements OpenAI-compatible API endpoints, enabling seamless integration with existing OpenAI SDK clients while leveraging the power of local ML inference.

Features

  • 🚀 Apple Silicon Optimized: Built on MLX framework, optimized for M1/M2/M3/M4 series chips
  • 🔌 OpenAI API Compatible: Drop-in replacement for OpenAI API endpoints
  • 🎯 Multiple AI Capabilities:
    • Audio Processing (TTS & STT)
    • Chat Completion
    • Image Generation
  • High Performance: Local inference with hardware acceleration
  • 🔐 Privacy-First: All processing happens locally on your machine
  • 🛠 SDK Support: Works with official OpenAI SDK and other compatible clients

Supported API Endpoints

The server implements OpenAI-compatible endpoints:

  • Chat completions: /v1/chat/completions
    • ✅ Chat
    • ✅ Tools, Function Calling
    • ✅ Structured Output
    • ✅ LogProbs
    • 🚧 Vision
  • Audio
    • /v1/audio/speech - Text-to-Speech
    • /v1/audio/transcriptions - Speech-to-Text
  • Models
    • /v1/models - List models
    • /v1/models/{model} - Retrieve or Delete model
  • Images
    • /v1/images/generations - Image generation

Installation

# Install using pip
pip install mlx-omni-server

Quick Start

There are two ways to use MLX Omni Server:

Method 1: Using the HTTP Server

  1. Start the server:
# If installed via pip as a package
mlx-omni-server

You can use --port to specify a different port, such as: mlx-omni-server --port 10240. The default port is 10240.

You can view more startup parameters by using mlx-omni-server --help.

  1. Configure the OpenAI client to use your local server:
from openai import OpenAI

# Configure client to use local server
client = OpenAI(
    base_url="http://localhost:10240/v1",  # Point to local server
    api_key="not-needed"  # API key is not required for local server
)

Method 2: Using TestClient (No Server Required)

For development or testing, you can use TestClient to interact directly with the application without starting a server:

from openai import OpenAI
from fastapi.testclient import TestClient
from mlx_omni_server.main import app

# Use TestClient to interact directly with the application
client = OpenAI(
    http_client=TestClient(app)  # Use TestClient directly, no network service needed
)

Example Usage

Regardless of which method you choose, you can use the client in the same way:

# Chat Completion Example
chat_completion = client.chat.completions.create(
    model="mlx-community/Llama-3.2-1B-Instruct-4bit",
    messages=[
        {"role": "user", "content": "What can you do?"}
    ]
)

# Text-to-Speech Example
response = client.audio.speech.create(
    model="lucasnewman/f5-tts-mlx",
    input="Hello, welcome to MLX Omni Server!"
)

# Speech-to-Text Example
audio_file = open("speech.mp3", "rb")
transcript = client.audio.transcriptions.create(
    model="mlx-community/whisper-large-v3-turbo",
    file=audio_file
)

# Image Generation Example
image_response = client.images.generate(
    model="argmaxinc/mlx-FLUX.1-schnell",
    prompt="A serene landscape with mountains and a lake",
    n=1,
    size="512x512"
)

You can view more examples in examples.

Contributing

We welcome contributions! If you're interested in contributing to MLX Omni Server, please check out our Development Guide for detailed information about:

  • Setting up the development environment
  • Running the server in development mode
  • Contributing guidelines
  • Testing and documentation

For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Disclaimer

This project is not affiliated with or endorsed by OpenAI or Apple. It's an independent implementation that provides OpenAI-compatible APIs using Apple's MLX framework.

Star History 🌟

Star History Chart

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mxlengine-0.3.5.tar.gz (26.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mxlengine-0.3.5-py3-none-any.whl (38.8 kB view details)

Uploaded Python 3

File details

Details for the file mxlengine-0.3.5.tar.gz.

File metadata

  • Download URL: mxlengine-0.3.5.tar.gz
  • Upload date:
  • Size: 26.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for mxlengine-0.3.5.tar.gz
Algorithm Hash digest
SHA256 9ca5a72bf1fe32b07f75d2e8c0d380653ef571fcb0980a1ab7eeb5cba421a9fe
MD5 f955839d21fb55c43451867486df0473
BLAKE2b-256 0c0afcc98fabbce8baa72d6060a2a6ddb3dcb492fa95a1a91089c90ee461fb8b

See more details on using hashes here.

File details

Details for the file mxlengine-0.3.5-py3-none-any.whl.

File metadata

  • Download URL: mxlengine-0.3.5-py3-none-any.whl
  • Upload date:
  • Size: 38.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for mxlengine-0.3.5-py3-none-any.whl
Algorithm Hash digest
SHA256 af6e2ad50a73fea1abf848280501ae302415a4def4d1f9262933f6fafcdabbec
MD5 7588d4ca2afabccddd9599cf4a5b7c08
BLAKE2b-256 02d3c4ee8b6143f25fb13f9769013a8c9c65b42bd1c5caef54962dbdf3761156

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page