Skip to main content

Local MLX Engine

Project description

MLX Engine

image

alt text

Note: MLX Engine is a fork of MLX Omni Server by @madroidmaq, refactored to use TurboAPI and enhance maintainability.

MLX Engine is a local inference server powered by Apple's MLX framework, specifically designed for Apple Silicon (M-series) chips. It implements OpenAI-compatible API endpoints, enabling seamless integration with existing OpenAI SDK clients while leveraging the power of local ML inference.

Features

  • 🚀 Apple Silicon Optimized: Built on MLX framework, optimized for M1/M2/M3/M4 series chips
  • 🔌 OpenAI API Compatible: Drop-in replacement for OpenAI API endpoints
  • 🎯 Multiple AI Capabilities:
    • Audio Processing (TTS & STT)
    • Chat Completion
    • Image Generation
  • High Performance: Local inference with hardware acceleration
  • 🔐 Privacy-First: All processing happens locally on your machine
  • 🛠 SDK Support: Works with official OpenAI SDK and other compatible clients

Supported API Endpoints

The server implements OpenAI-compatible endpoints:

  • Chat completions: /v1/chat/completions
    • ✅ Chat
    • ✅ Tools, Function Calling
    • ✅ Structured Output
    • ✅ LogProbs
    • 🚧 Vision
  • Audio
    • /v1/audio/speech - Text-to-Speech
    • /v1/audio/transcriptions - Speech-to-Text
  • Models
    • /v1/models - List models
    • /v1/models/{model} - Retrieve or Delete model
  • Images
    • /v1/images/generations - Image generation

Installation

# Install using pip
pip install mlxengine

Quick Start

There are two ways to use MLX Engine:

Method 1: Using the HTTP Server

  1. Start the server:
# If installed via pip as a package
mlxengine

You can use --port to specify a different port, such as: mlxengine --port 10240. The default port is 10240.

You can view more startup parameters by using mlxengine --help.

  1. Configure the OpenAI client to use your local server:
from openai import OpenAI

# Configure client to use local server
client = OpenAI(
    base_url="http://localhost:10240/v1",  # Point to local server
    api_key="not-needed"  # API key is not required for local server
)

Method 2: Using TestClient (No Server Required)

For development or testing, you can use TestClient to interact directly with the application without starting a server:

from openai import OpenAI
from fastapi.testclient import TestClient # TODO: Update this import once TurboAPI has TestClient
from mlxengine.main import app

# Use TestClient to interact directly with the application
client = OpenAI(
    http_client=TestClient(app)  # Use TestClient directly, no network service needed
)

Example Usage

Regardless of which method you choose, you can use the client in the same way:

# Chat Completion Example
chat_completion = client.chat.completions.create(
    model="mlx-community/Llama-3.2-1B-Instruct-4bit",
    messages=[
        {"role": "user", "content": "What can you do?"}
    ]
)

# Text-to-Speech Example
response = client.audio.speech.create(
    model="lucasnewman/f5-tts-mlx",
    input="Hello, welcome to MLX Engine!"
)

# Speech-to-Text Example
audio_file = open("speech.mp3", "rb")
transcript = client.audio.transcriptions.create(
    model="mlx-community/whisper-large-v3-turbo",
    file=audio_file
)

# Image Generation Example
image_response = client.images.generate(
    model="argmaxinc/mlx-FLUX.1-schnell",
    prompt="A serene landscape with mountains and a lake",
    n=1,
    size="512x512"
)

# Tool Calling Example
import json
from datetime import datetime
from openai import OpenAI

model = "mlx-community/QwQ-32B-4bit" # Make sure this model supports tool calling
client = OpenAI(
    base_url="http://localhost:10240/v1",
    api_key="not-needed"
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_delivery_date",
            "description": "Get the delivery date for a customer's order. Call this whenever you need to know the delivery date, for example when a customer asks 'Where is my package'",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The customer's order ID.",
                    },
                },
                "required": ["order_id"],
                "additionalProperties": False,
            },
        }
    }
]

messages = [
    {
        "role": "system",
        "content": "You are a helpful customer support assistant. Use the supplied tools to assist the user."
    },
    {
        "role": "user",
        "content": "Hi, can you tell me the delivery date for my order?"
    },
    {
        "role": "assistant",
        "content": "Hi there! I can help with that. Can you please provide your order ID?"
    },
    {
        "role": "user",
        "content": "i think it is order_12345"
    }
]

# First API call: The model decides to use the tool
completion = client.chat.completions.create(
    model=model,
    messages=messages,
    tools=tools,
)

response_message = completion.choices[0].message
print("Assistant Response (Tool Call):")
print(response_message)

# Check if the model wants to call a tool
if response_message.tool_calls:
    print("\nTool Calls Detected:")
    print(response_message.tool_calls)

    # Append the assistant's message (with tool calls) to the conversation
    messages.append(response_message)

    # --- Simulate executing the function and getting the result ---
    # In a real application, you would execute the function based on the name and arguments
    tool_call = response_message.tool_calls[0]
    function_name = tool_call.function.name
    function_args = json.loads(tool_call.function.arguments)
    
    if function_name == "get_delivery_date":
        order_id = function_args.get("order_id")
        # Simulate fetching data
        delivery_date = datetime.now()
        function_response = {
            "order_id": order_id,
            "delivery_date": delivery_date.strftime('%Y-%m-%d %H:%M:%S')
        }
    else:
        # Handle other potential function calls if needed
        function_response = {"error": "Unknown function"}

    # Append the tool response message to the conversation
    function_call_result_message = {
        "role": "tool",
        "content": json.dumps(function_response),
        "tool_call_id": tool_call.id
    }
    messages.append(function_call_result_message)
    
    print("\nTool Response Message (Appended):")
    print(function_call_result_message)

    # Second API call: Send the tool response back to the model
    print("\nSending tool response back to model...")
    completion_with_tool_response = client.chat.completions.create(
        model=model,
        messages=messages,
        tools=tools,
    )
    
    final_assistant_message = completion_with_tool_response.choices[0].message
    print("\nFinal Assistant Response:")
    print(final_assistant_message)
else:
    print("\nAssistant Response (No Tool Call):")
    print(response_message.content)

You can view more examples in examples.

Contributing

We welcome contributions! If you're interested in contributing to MLX Engine, please check out our Development Guide for detailed information about:

  • Setting up the development environment
  • Running the server in development mode
  • Contributing guidelines
  • Testing and documentation

For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Disclaimer

This project is not affiliated with or endorsed by OpenAI or Apple. It's an independent implementation that provides OpenAI-compatible APIs using Apple's MLX framework.

Star History 🌟

Star History Chart

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlxengine-0.0.3.tar.gz (32.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlxengine-0.0.3-py3-none-any.whl (45.5 kB view details)

Uploaded Python 3

File details

Details for the file mlxengine-0.0.3.tar.gz.

File metadata

  • Download URL: mlxengine-0.0.3.tar.gz
  • Upload date:
  • Size: 32.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for mlxengine-0.0.3.tar.gz
Algorithm Hash digest
SHA256 1b2e942bae3ed6525fc3015723a5e7178299c0e417ee6b5ad74b741e7d14bef2
MD5 0c06c96bb4d981fe65d5f0fd327465a4
BLAKE2b-256 124b9892ee3b646b4b7ce722553fc3129b1e96465b4d4bd9456aa2a254f20048

See more details on using hashes here.

File details

Details for the file mlxengine-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: mlxengine-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 45.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for mlxengine-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 b67da178ef5cba14ba26f773f031db4fca2199aa73b98fe55b90b3643fbc7451
MD5 6d582cb932dee7092af8427df02382e9
BLAKE2b-256 76c994e6fde3ecf64d7aa8c0b590f175a90166648be30342e8444fa90935e444

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page