Skip to main content

A high-performance API server that provides OpenAI-compatible endpoints for MLX models.

Project description

mlx-openai-server

MIT License Python 3.11

A high-performance OpenAI-compatible API server for MLX models. Run text, vision, audio, and image generation models locally on Apple Silicon with a drop-in OpenAI replacement.

Note: Requires macOS with M-series chips (MLX is optimized for Apple Silicon).


Table of Contents

Click to expand

5-Second Quick Start

mlx-openai-server launch --model-path mlx-community/Qwen3-Coder-Next-4bit --model-type lm

Then point your OpenAI client to http://localhost:8000/v1. For full setup, see Installation and Quick Start.


Key Features

  • ๐Ÿš€ OpenAI-compatible API - Drop-in replacement for OpenAI services
  • ๐Ÿ–ผ๏ธ Multimodal support - Text, vision, audio, and image generation/editing
  • ๐ŸŽจ Flux-series models - Image generation (schnell, dev, krea-dev, flux-2-klein) and editing (kontext, qwen-image-edit)
  • ๐Ÿ”Œ Easy integration - Works with existing OpenAI client libraries
  • ๐Ÿ“ฆ Multi-model mode - Run multiple models in one server via a YAML config; route requests by model ID
  • โšก Performance - Configurable quantization (4/8/16-bit), context length, and speculative decoding (lm)
  • ๐ŸŽ›๏ธ LoRA adapters - Fine-tuned image generation and editing
  • ๐Ÿ“ˆ Queue management - Built-in request queuing and monitoring

Installation

Prerequisites

  • macOS with Apple Silicon (M-series)
  • Python 3.11+

Quick Install

# Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate

# Install from PyPI
pip install mlx-openai-server

# Or install from GitHub
pip install git+https://github.com/cubist38/mlx-openai-server.git

Optional: Whisper Support

For audio transcription models, install ffmpeg:

brew install ffmpeg

Quick Start

Start the Server

# Text-only or multimodal models
mlx-openai-server launch \
  --model-path <path-to-mlx-model> \
  --model-type <lm|multimodal>

# Text-only with speculative decoding (faster generation using a smaller draft model)
mlx-openai-server launch \
  --model-path <path-to-main-model> \
  --model-type lm \
  --draft-model-path <path-to-draft-model> \
  --num-draft-tokens 4

# Image generation (Flux-series)
mlx-openai-server launch \
  --model-type image-generation \
  --model-path <path-to-flux-model> \
  --config-name flux-dev \
  --quantize 8

# Image editing
mlx-openai-server launch \
  --model-type image-edit \
  --model-path <path-to-flux-model> \
  --config-name flux-kontext-dev \
  --quantize 8

# Embeddings
mlx-openai-server launch \
  --model-type embeddings \
  --model-path <embeddings-model-path>

# Whisper (audio transcription)
mlx-openai-server launch \
  --model-type whisper \
  --model-path mlx-community/whisper-large-v3-mlx

Server Parameters

Parameter Required Type Default Description
Required parameters
--model-path Yes path โ€” Path to MLX model (local or HuggingFace repo)
--model-type Yes string โ€” lm, multimodal, image-generation, image-edit, embeddings, or whisper
Model configuration
--config-name No* string โ€” Image models: flux-schnell, flux-dev, flux-krea-dev, flux-kontext-dev, flux2-klein-4b, flux2-klein-9b, qwen-image, qwen-image-edit, z-image-turbo, fibo
--quantize No int โ€” Quantization level: 4, 8, or 16 (image models)
--context-length No int โ€” Max sequence length for memory optimization
Sampling parameters (used when API request omits them)
--max-tokens No int 100000 Default maximum tokens to generate
--temperature No float 1.0 Default sampling temperature
--top-p No float 1.0 Default nucleus sampling (top-p) probability
--top-k No int 20 Default top-k sampling parameter
Speculative decoding (lm only)
--draft-model-path No path โ€” Path to draft model for speculative decoding
--num-draft-tokens No int 2 Draft tokens per step
Advanced options
--lora-paths No string โ€” Comma-separated LoRA adapter paths (image models)
--lora-scales No string โ€” Comma-separated LoRA scales (must match paths)
--log-level No string INFO DEBUG, INFO, WARNING, ERROR, CRITICAL
--no-log-file No flag false Disable file logging (console only)

*Required for image-generation and image-edit model types.

Launching Multiple Models

You can run several models in one server using a YAML config file. Each model gets its own handler; requests are routed by the model ID you use in the API (the model field in the request).

Video: Serving Multiple Models at Once? mlx-openai-server + OpenWebUI Test

Start with a config file

mlx-openai-server launch --config config.yaml

You must provide either --config (multi-handler) or --model-path (single model). You cannot mix them.

YAML config format

Create a YAML file with a server section (host, port, logging) and a models list. Each entry in models defines one model and supports the same options as the CLI (model path, type, context length, queue settings, etc.).

Key Required Description
model_path Yes Path or HuggingFace repo of the model
model_type No lm, multimodal, image-generation, image-edit, embeddings, whisper (default: lm)
model_id No ID used in API requests; defaults to model_path if omitted
context_length No Max context length (lm / multimodal)
max_concurrency, queue_timeout, queue_size No Per-model queue settings

Example config.yaml:

server:
  host: "0.0.0.0"
  port: 8000
  log_level: INFO
  # log_file: logs/app.log     # uncomment to log to file
  # no_log_file: true           # uncomment to disable file logging

models:
  # Language model
  - model_path: mlx-community/GLM-4.7-Flash-8bit
    model_type: lm
    model_id: glm-4.7-flash    # optional alias (defaults to model_path)
    enable_auto_tool_choice: true
    tool_call_parser: glm4_moe
    reasoning_parser: glm47_flash
    message_converter: glm4_moe

  # Another language model
  - model_path: mlx-community/Qwen3-Coder-Next-4bit
    model_type: lm
    max_concurrency: 1
    tool_call_parser: qwen3_coder
    message_converter: qwen3_coder


  - model_path: mlx-community/Qwen3-VL-2B-Instruct-4bit
    model_type: multimodal
    tool_call_parser: qwen3_vl

  - model_path: black-forest-labs/FLUX.2-klein-4B
    model_type: image-generation
    config_name: flux2-klein-4b
    quantize: 4
    model_id: flux2-klein-4b

A full example is in examples/config.yaml.

Multi-handler process isolation (HandlerProcessProxy)

In multi-handler mode, each model runs in a dedicated subprocess spawned via multiprocessing.get_context("spawn"). The main FastAPI process uses a HandlerProcessProxy to forward requests to the child process over multiprocessing queues.

This design prevents MLX Metal/GPU semaphore leaks on macOS. When MLX arrays or Metal runtime state are shared across forked processes, the resource tracker can report leaked semaphore objects at shutdown (ml-explore/mlx#2457). Using spawn instead of the default fork gives each model a clean Metal context, avoiding those warnings.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Main Process (FastAPI)             โ”‚     โ”‚  Child Process (Handler)             โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚     โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚  HandlerProcessProxy          โ”‚  โ”‚     โ”‚  โ”‚  Concrete handler (e.g.       โ”‚  โ”‚
โ”‚  โ”‚  โ€ข request_queue โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€>โ”‚    MLXLMHandler)              โ”‚  โ”‚
โ”‚  โ”‚  โ€ข response_queue <โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”ผ<โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”‚  โ€ข Model (MLX_LM)              โ”‚  โ”‚
โ”‚  โ”‚  โ€ข generate_*() forwards RPC  โ”‚  โ”‚     โ”‚  โ”‚  โ€ข InferenceWorker (thread)   โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚     โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

The proxy exposes the same interface as the concrete handlers (generate_text_stream, generate_embeddings_response, etc.), so API endpoints work without changes. Requests and responses are serialized across the process boundary via queues; non-picklable objects (e.g. uploaded files) are pre-processed in the main process before being sent as file paths.

Using the API with multiple models

Set the model field in your request to the model ID (the model_id from the config, or model_path if you did not set model_id). The server looks up the handler for that ID and runs the request on the correct model.

import openai

client = openai.OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")

# Use the first model (qwen2.5-7b)
r1 = client.chat.completions.create(
    model="glm-4.7-flash",
    messages=[{"role": "user", "content": "Say hello in one word."}],
)
print(r1.choices[0].message.content)

# Use the second model (full path as model_id)
r2 = client.chat.completions.create(
    model="mlx-community/Qwen3-Coder-Next-4bit",
    messages=[{"role": "user", "content": "Say hello in one word."}],
)
print(r2.choices[0].message.content)
  • GET /v1/models returns all loaded models (their IDs).
  • If you send a model that is not in the config, the server returns 404 with an error listing available models.

Supported Model Types

  1. Text-only (lm) - Language models via mlx-lm
  2. Multimodal (multimodal) - Text, images, audio via mlx-vlm
  3. Image generation (image-generation) - Flux-series, Qwen Image, Z-Image Turbo, Fibo
  4. Image editing (image-edit) - Flux kontext, Qwen Image Edit
  5. Embeddings (embeddings) - Text embeddings via mlx-embeddings
  6. Whisper (whisper) - Audio transcription (requires ffmpeg)

Image Model Configurations

Generation:

  • flux-schnell - Fast (4 steps, no guidance)
  • flux-dev - Balanced (25 steps, 3.5 guidance)
  • flux-krea-dev - High quality (28 steps, 4.5 guidance)
  • flux2-klein-4b / flux2-klein-9b - Flux 2 Klein models
  • qwen-image - Qwen image generation (50 steps, 4.0 guidance)
  • z-image-turbo - Z-Image Turbo
  • fibo - Fibo model

Editing:

  • flux-kontext-dev - Context-aware editing (28 steps, 2.5 guidance)
  • flux2-klein-edit-4b / flux2-klein-edit-9b - Flux 2 Klein editing
  • qwen-image-edit - Qwen image editing (50 steps, 4.0 guidance)

Common Use Cases

Use Case One-liner Launch
Text generation mlx-openai-server launch --model-type lm --model-path <path>
Vision Q&A mlx-openai-server launch --model-type multimodal --model-path <path>
Image generation mlx-openai-server launch --model-type image-generation --model-path <path> --config-name flux-dev
Image editing mlx-openai-server launch --model-type image-edit --model-path <path> --config-name flux-kontext-dev
Audio transcription mlx-openai-server launch --model-type whisper --model-path mlx-community/whisper-large-v3-mlx
Embeddings mlx-openai-server launch --model-type embeddings --model-path <path>

Using the API

The server provides OpenAI-compatible endpoints. Use standard OpenAI client libraries:

Text Completion

import openai

client = openai.OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="not-needed"
)

response = client.chat.completions.create(
    model="local-model",
    messages=[{"role": "user", "content": "What is the capital of France?"}]
)
print(response.choices[0].message.content)

Vision (Multimodal)

import openai
import base64

client = openai.OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")

with open("image.jpg", "rb") as f:
    base64_image = base64.b64encode(f.read()).decode('utf-8')

response = client.chat.completions.create(
    model="local-multimodal",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}
        ]
    }]
)
print(response.choices[0].message.content)

Image Generation

import openai
import base64
from io import BytesIO
from PIL import Image

client = openai.OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")

response = client.images.generate(
    prompt="A serene landscape with mountains and a lake at sunset",
    model="local-image-generation-model",
    size="1024x1024"
)

image_data = base64.b64decode(response.data[0].b64_json)
image = Image.open(BytesIO(image_data))
image.show()

Image Editing

import openai
import base64
from io import BytesIO
from PIL import Image

client = openai.OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")

with open("image.png", "rb") as f:
    result = client.images.edit(
        image=f,
        prompt="make it like a photo in 1800s",
        model="flux-kontext-dev"
    )

image_data = base64.b64decode(result.data[0].b64_json)
image = Image.open(BytesIO(image_data))
image.show()

Function Calling

import openai

client = openai.OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")

messages = [{"role": "user", "content": "What is the weather in Tokyo?"}]
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the weather in a given city",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "The city name"}
            }
        }
    }
}]

completion = client.chat.completions.create(
    model="local-model",
    messages=messages,
    tools=tools,
    tool_choice="auto"
)

if completion.choices[0].message.tool_calls:
    tool_call = completion.choices[0].message.tool_calls[0]
    print(f"Function: {tool_call.function.name}")
    print(f"Arguments: {tool_call.function.arguments}")

Embeddings

import openai

client = openai.OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")

response = client.embeddings.create(
    model="local-model",
    input=["The quick brown fox jumps over the lazy dog"]
)

print(f"Embedding dimension: {len(response.data[0].embedding)}")

Responses API

The server exposes the OpenAI Responses API at POST /v1/responses. Use client.responses.create() with the OpenAI SDK for text and multimodal (lm/multimodal) models.

Text input (non-streaming):

import openai

client = openai.OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")

response = client.responses.create(
    model="local-model",
    input="Tell me a three sentence bedtime story about a unicorn."
)
# response.output contains reasoning and message items
for item in response.output:
    if item.type == "message":
        for part in item.content:
            if getattr(part, "text", None):
                print(part.text)

Text input (streaming):

response = client.responses.create(
    model="local-model",
    input="Tell me a three sentence bedtime story about a unicorn.",
    stream=True
)
for chunk in response:
    print(chunk)

Image input (vision / multimodal):

response = client.responses.create(
    model="local-multimodal",
    input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": "What is in this image?"},
                {
                    "type": "input_image",
                    "image_url": "path/to/image.jpg",
                    "detail": "low"
                }
            ]
        }
    ]
)

Function calling:

tools = [{
    "type": "function",
    "name": "get_current_weather",
    "description": "Get the current weather in a given location",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {"type": "string", "description": "The city and state"},
            "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
        },
        "required": ["location", "unit"]
    }
}]

response = client.responses.create(
    model="local-model",
    tools=tools,
    input="What is the weather like in Boston today?",
    tool_choice="auto"
)

Structured outputs (Pydantic):

from pydantic import BaseModel

class Address(BaseModel):
    street: str
    city: str
    state: str
    zip: str

response = client.responses.parse(
    model="local-model",
    input=[{"role": "user", "content": "Format: 1 Hacker Wy Menlo Park CA 94025"}],
    text_format=Address
)
address = response.output_parsed  # Pydantic model instance
print(address)

See examples/responses_api.ipynb for full examples including streaming, image input, tool calls, and structured outputs.

Structured Outputs (JSON Schema)

import openai
import json

client = openai.OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")

response_format = {
    "type": "json_schema",
    "json_schema": {
        "name": "Address",
        "schema": {
            "type": "object",
            "properties": {
                "street": {"type": "string"},
                "city": {"type": "string"},
                "state": {"type": "string"},
                "zip": {"type": "string"}
            },
            "required": ["street", "city", "state", "zip"]
        }
    }
}

completion = client.chat.completions.create(
    model="local-model",
    messages=[{"role": "user", "content": "Format: 1 Hacker Wy Menlo Park CA 94025"}],
    response_format=response_format
)

address = json.loads(completion.choices[0].message.content)
print(json.dumps(address, indent=2))

Advanced Configuration

Parser Configuration

For models requiring custom parsing (tool calls, reasoning):

mlx-openai-server launch \
  --model-path <path-to-model> \
  --model-type lm \
  --tool-call-parser qwen3 \
  --reasoning-parser qwen3 \
  --enable-auto-tool-choice

Available parsers: qwen3, glm4_moe, qwen3_coder, qwen3_moe, qwen3_next, qwen3_vl, harmony, minimax_m2

Message Converters

For models requiring message format conversion:

mlx-openai-server launch \
  --model-path <path-to-model> \
  --model-type lm \
  --message-converter glm4_moe

Available converters: glm4_moe, minimax_m2, nemotron3_nano, qwen3_coder

Custom Chat Templates

mlx-openai-server launch \
  --model-path <path-to-model> \
  --model-type lm \
  --chat-template-file /path/to/template.jinja

Speculative Decoding (lm)

Use a smaller draft model to propose tokens and verify them with the main model for faster text generation. Supported only for --model-type lm.

mlx-openai-server launch \
  --model-path mlx-community/MyModel-8B-4bit \
  --model-type lm \
  --draft-model-path mlx-community/MyModel-1B-4bit \
  --num-draft-tokens 4
  • --draft-model-path: Path or HuggingFace repo of the draft model (smaller size model).
  • --num-draft-tokens: Number of tokens the draft model generates per verification step (default: 2). Higher values can increase throughput at the cost of more draft compute.

Request Queue System

The server includes a request queue system with monitoring:

# Check queue status
curl http://localhost:8000/v1/queue/stats

Response:

{
  "status": "ok",
  "queue_stats": {
    "running": true,
    "queue_size": 3,
    "max_queue_size": 100,
    "active_requests": 1,
    "max_concurrency": 1
  }
}

Example Notebooks

Check the examples/ directory for comprehensive guides:

Category Notebooks Description
Text & Chat responses_api.ipynb, simple_rag_demo.ipynb Responses API (text, image, tools, streaming, structured outputs); RAG pipeline demo
Vision vision_examples.ipynb Vision capabilities
Audio audio_examples.ipynb, transcription_examples.ipynb Audio processing and transcription
Embeddings embedding_examples.ipynb, lm_embeddings_examples.ipynb, vlm_embeddings_examples.ipynb Text, LM, and VLM embeddings
Images image_generations.ipynb, image_edit.ipynb Image generation and editing
Advanced structured_outputs_examples.ipynb JSON schema / structured outputs

Large Models

For models that don't fit in RAM, improve performance on macOS 15.0+:

bash configure_mlx.sh

This raises the system's wired memory limit for better performance.


Troubleshooting

Issue Solution
Memory problems Use --quantize 4 or 8 for image models; reduce --context-length for lm/multimodal. Run configure_mlx.sh on macOS 15+ to raise wired memory limits.
Model download issues Ensure transformers and huggingface_hub are installed. Check network access; some models require Hugging Face login.
Port already in use Use --port to specify a different port (e.g. --port 8001).
Quantization questions For lm/multimodal, use pre-quantized models from mlx-community. For image models, use --quantize 4 or 8.
Metal/semaphore warnings Use multi-handler mode (--config); each model runs in a spawned subprocess to avoid Metal context issues.

Quick Reference Card

# Text (language model)
mlx-openai-server launch --model-type lm --model-path <path>

# Vision (multimodal)
mlx-openai-server launch --model-type multimodal --model-path <path>

# Image generation
mlx-openai-server launch --model-type image-generation --model-path <path> --config-name flux-dev

# Image editing
mlx-openai-server launch --model-type image-edit --model-path <path> --config-name flux-kontext-dev

# Embeddings
mlx-openai-server launch --model-type embeddings --model-path <path>

# Whisper (audio transcription)
mlx-openai-server launch --model-type whisper --model-path mlx-community/whisper-large-v3-mlx

Contributing

We welcome contributions! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes with tests
  4. Submit a pull request

Follow Conventional Commits for commit messages.

Support

License

MIT License - see LICENSE file for details.

Acknowledgments

Built on top of:


GitHub stars

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_openai_server-1.6.0.tar.gz (120.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlx_openai_server-1.6.0-py3-none-any.whl (132.3 kB view details)

Uploaded Python 3

File details

Details for the file mlx_openai_server-1.6.0.tar.gz.

File metadata

  • Download URL: mlx_openai_server-1.6.0.tar.gz
  • Upload date:
  • Size: 120.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for mlx_openai_server-1.6.0.tar.gz
Algorithm Hash digest
SHA256 903be8cc21be45f9ce4f52adb357468e5f43d31a2009d16ade6a0c6769b3ad31
MD5 fe97f4ac1d037e0549ab540041ee71ed
BLAKE2b-256 6ceb50e100ccb04e6025e26c8aeae78ef666dc5d32d40124879842ec6120405b

See more details on using hashes here.

File details

Details for the file mlx_openai_server-1.6.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mlx_openai_server-1.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 35eb22aeac23fbe3fad084e62e8e802524c8e948b4a5798bac0bf44953e94b79
MD5 121db5eede144b67f7429fbd4e716f68
BLAKE2b-256 494a10feda998cf0ae5871b9750a5bfb1802c738e88313a5237a1cfb756b57a5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page