Skip to main content

LLMs on Apple silicon with MLX and the Hugging Face Hub

Project description

Generate Text with LLMs and MLX

The easiest way to get started is to install the mlx-lm package:

pip install mlx-lm

Python API

You can use mlx-lm as a module:

from mlx_lm import load, generate

model, tokenizer = load("mistralai/Mistral-7B-v0.1")

response = generate(model, tokenizer, prompt="hello", verbose=True)

To see a description of all the arguments you can do:

>>> help(generate)

The mlx-lm package also comes with functionality to quantize and optionally upload models to the Hugging Face Hub.

You can convert models in the Python API with:

from mlx_lm import convert 

upload_repo = "mlx-community/My-Mistral-7B-v0.1-4bit"

convert("mistralai/Mistral-7B-v0.1", quantize=True, upload_repo=upload_repo)

This will generate a 4-bit quantized Mistral-7B and upload it to the repo mlx-community/My-Mistral-7B-v0.1-4bit. It will also save the converted model in the path mlx_model by default.

To see a description of all the arguments you can do:

>>> help(convert)

Command Line

You can also use mlx-lm from the command line with:

python -m mlx_lm.generate --model mistralai/Mistral-7B-v0.1 --prompt "hello"

This will download a Mistral 7B model from the Hugging Face Hub and generate text using the given prompt.

For a full list of options run:

python -m mlx_lm generate --help

To quantize a model from the command line run:

python -m mlx_lm.convert --hf-path mistralai/Mistral-7B-v0.1 -q 

For more options run:

python -m mlx_lm.convert --help

You can upload new models to Hugging Face by specifying --upload-repo to convert. For example, to upload a quantized Mistral-7B model to the MLX Hugging Face community you can do:

python -m mlx_lm.convert \
    --hf-path mistralai/Mistral-7B-v0.1 \
    -q \
    --upload-repo mlx-community/my-4bit-mistral \

Supported Models

The example supports Hugging Face format Mistral, Llama, and Phi-2 style models. If the model you want to run is not supported, file an issue or better yet, submit a pull request.

Here are a few examples of Hugging Face models that work with this example:

Most Mistral, Llama, and Phi-2 style models should work out of the box.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx-lm-0.0.1.tar.gz (10.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlx_lm-0.0.1-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file mlx-lm-0.0.1.tar.gz.

File metadata

  • Download URL: mlx-lm-0.0.1.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for mlx-lm-0.0.1.tar.gz
Algorithm Hash digest
SHA256 338985ed7ad19fbf672a5fa983560b58a056b5dd3b47f3de587950d428f5304b
MD5 22cfc87d8c2ce2cfdf3a5c13b5767e95
BLAKE2b-256 b5153da62db3dfc2b7e0462505240178967ec796b334a5e202cd2ec2fd84ddda

See more details on using hashes here.

File details

Details for the file mlx_lm-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: mlx_lm-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 11.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for mlx_lm-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a091d94ec0429d2f9da69e9308afbe416f7671d3bfc2cdaed83f9674126cfe8b
MD5 7d7b63fe17bb18545272854e5c9d27cd
BLAKE2b-256 36cee137a4be6cce46fbf5bccb5d3c65bc13fadeff5defc331c24d2d096ff118

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page