server to serve mlx model as an OpenAI compatible API
Project description
MLX-LLM
This guide will help you set up the MLX-LLM server to serve the model as an OpenAI compatible API.
Quick Start
- Start the server with the following command:
python -m server --model-path <path-to-your-model>
The MLX-LLM server can serve both Hugging Face format models and quantized MLX models. You can find these models at the MLX Community on Hugging Face.
Setup Guide
Miniconda Installation
For Apple Silicon users, install Miniconda natively with these commands:
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
bash Miniforge3-MacOSX-arm64.sh
Conda Environment Setup
After Miniconda installation, create a dedicated conda environment for MLX-LLM:
conda create -n mlx-llm python=3.10
conda activate mlx-llm
Installing Dependencies
With the mlx-llm
environment activated, install the necessary dependencies using the following command:
pip install -r requirements.txt
Testing the API with curl
You can test the API using the curl
command. Here's an example:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer no-key" \
-d '{
"model": "gpt-3.5-turbo",
"stop":["<|im_end|>"],
"messages": [
{
"role": "user",
"content": "Write a limerick about python exceptions"
}
]
}'
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
mlx-llm-server-0.1.2.tar.gz
(5.3 kB
view hashes)
Built Distribution
Close
Hashes for mlx_llm_server-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fabd024acfa1b0b7d4e03493ee30f12c0cd5e04bf509da4f700948b837ac64cc |
|
MD5 | 9f4cda7d03c42633ac5b913696c8cd19 |
|
BLAKE2b-256 | fd0bee296cb4df572236fc714bd897c51f2c03986d6992da9702156be3168ef3 |