server to serve mlx model as an OpenAI compatible API

Project description

MLX-LLM

This guide will help you set up the MLX-LLM server to serve the model as an OpenAI compatible API.

Quick Start

Start the server with the following command:

python -m server --model-path <path-to-your-model>

The MLX-LLM server can serve both Hugging Face format models and quantized MLX models. You can find these models at the MLX Community on Hugging Face.

Setup Guide

Miniconda Installation

For Apple Silicon users, install Miniconda natively with these commands:

wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
bash Miniforge3-MacOSX-arm64.sh

Conda Environment Setup

After Miniconda installation, create a dedicated conda environment for MLX-LLM:

conda create -n mlx-llm python=3.10
conda activate mlx-llm

Installing Dependencies

With the mlx-llm environment activated, install the necessary dependencies using the following command:

pip install -r requirements.txt

Testing the API with curl

You can test the API using the curl command. Here's an example:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer no-key" \
  -d '{
  "model": "gpt-3.5-turbo",
  "stop":["<|im_end|>"],
  "messages": [
    {
      "role": "user",
      "content": "Write a limerick about python exceptions"
    }
  ]
}'

Project details

Release history Release notifications | RSS feed

0.1.10

Mar 24, 2024

0.1.9

Mar 2, 2024

0.1.8

Mar 2, 2024

0.1.7

Feb 16, 2024

0.1.6

Feb 13, 2024

0.1.5

Jan 26, 2024

0.1.4

Jan 26, 2024

0.1.3 yanked

Jan 25, 2024

Reason this release was yanked:

bad build

This version

0.1.2

Jan 21, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx-llm-server-0.1.2.tar.gz (5.3 kB view hashes)

Uploaded Jan 21, 2024 Source

Built Distribution

mlx_llm_server-0.1.2-py3-none-any.whl (6.1 kB view hashes)

Uploaded Jan 21, 2024 Python 3

Hashes for mlx-llm-server-0.1.2.tar.gz

Hashes for mlx-llm-server-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`5ad8e1bff723e203549886a48f46c69cf9176e84e0f0021ffff217c8116174af`
MD5	`66c460288b96aefa8d421d084a90d4c8`
BLAKE2b-256	`36e94eac73cd0b7f34b16d3a4e971a6eada2ed29f7fe3fe2cb18214719d527f1`

Hashes for mlx_llm_server-0.1.2-py3-none-any.whl

Hashes for mlx_llm_server-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fabd024acfa1b0b7d4e03493ee30f12c0cd5e04bf509da4f700948b837ac64cc`
MD5	`9f4cda7d03c42633ac5b913696c8cd19`
BLAKE2b-256	`fd0bee296cb4df572236fc714bd897c51f2c03986d6992da9702156be3168ef3`