Skip to main content

LLMs on Apple silicon with MLX and the Hugging Face Hub

Project description

Generate Text with LLMs and MLX

The easiest way to get started is to install the mlx-lm package:

pip install mlx-lm

Python API

You can use mlx-lm as a module:

from mlx_lm import load, generate

model, tokenizer = load("mistralai/Mistral-7B-v0.1")

response = generate(model, tokenizer, prompt="hello", verbose=True)

To see a description of all the arguments you can do:

>>> help(generate)

The mlx-lm package also comes with functionality to quantize and optionally upload models to the Hugging Face Hub.

You can convert models in the Python API with:

from mlx_lm import convert 

upload_repo = "mlx-community/My-Mistral-7B-v0.1-4bit"

convert("mistralai/Mistral-7B-v0.1", quantize=True, upload_repo=upload_repo)

This will generate a 4-bit quantized Mistral-7B and upload it to the repo mlx-community/My-Mistral-7B-v0.1-4bit. It will also save the converted model in the path mlx_model by default.

To see a description of all the arguments you can do:

>>> help(convert)

Command Line

You can also use mlx-lm from the command line with:

python -m mlx_lm.generate --model mistralai/Mistral-7B-v0.1 --prompt "hello"

This will download a Mistral 7B model from the Hugging Face Hub and generate text using the given prompt.

For a full list of options run:

python -m mlx_lm generate --help

To quantize a model from the command line run:

python -m mlx_lm.convert --hf-path mistralai/Mistral-7B-v0.1 -q 

For more options run:

python -m mlx_lm.convert --help

You can upload new models to Hugging Face by specifying --upload-repo to convert. For example, to upload a quantized Mistral-7B model to the MLX Hugging Face community you can do:

python -m mlx_lm.convert \
    --hf-path mistralai/Mistral-7B-v0.1 \
    -q \
    --upload-repo mlx-community/my-4bit-mistral \

Supported Models

The example supports Hugging Face format Mistral, Llama, and Phi-2 style models. If the model you want to run is not supported, file an issue or better yet, submit a pull request.

Here are a few examples of Hugging Face models that work with this example:

Most Mistral, Llama, and Phi-2 style models should work out of the box.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx-lm-0.0.2.tar.gz (10.5 kB view details)

Uploaded Source

Built Distribution

mlx_lm-0.0.2-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file mlx-lm-0.0.2.tar.gz.

File metadata

  • Download URL: mlx-lm-0.0.2.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for mlx-lm-0.0.2.tar.gz
Algorithm Hash digest
SHA256 3aedc79f313cfa47cdc904305475728c0128f30ad411e5f0108e4baf46259640
MD5 1a5d4648a997996c51905bb4334d84b1
BLAKE2b-256 58462558e5de26c9bb19e0048348ea7be9f3403f9c9d59bc4c2c9a0b05abf489

See more details on using hashes here.

File details

Details for the file mlx_lm-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: mlx_lm-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 11.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for mlx_lm-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 665ab1f13b73e00131d8c42c02d7fd0bd71a8787f3d858ca299ccca602226dbd
MD5 65dda399aabbd821f35ddf0892e56945
BLAKE2b-256 56d4690669e5b150f200b02d8055f7a2962d500e4f7ab2ac86ca6ecee0b4c4ad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page