Skip to main content

LLMs on Apple silicon with MLX and the Hugging Face Hub

Project description

Generate Text with LLMs and MLX

The easiest way to get started is to install the mlx-lm package:

With pip:

pip install mlx-lm

With conda:

conda install -c conda-forge mlx-lm

The mlx-lm package also supports LoRA and QLoRA fine-tuning. For more details on this see the LoRA documentation.

Python API

You can use mlx-lm as a module:

from mlx_lm import load, generate

model, tokenizer = load("mistralai/Mistral-7B-v0.1")

response = generate(model, tokenizer, prompt="hello", verbose=True)

To see a description of all the arguments you can do:

>>> help(generate)

The mlx-lm package also comes with functionality to quantize and optionally upload models to the Hugging Face Hub.

You can convert models in the Python API with:

from mlx_lm import convert

upload_repo = "mlx-community/My-Mistral-7B-v0.1-4bit"

convert("mistralai/Mistral-7B-v0.1", quantize=True, upload_repo=upload_repo)

This will generate a 4-bit quantized Mistral-7B and upload it to the repo mlx-community/My-Mistral-7B-v0.1-4bit. It will also save the converted model in the path mlx_model by default.

To see a description of all the arguments you can do:

>>> help(convert)

Command Line

You can also use mlx-lm from the command line with:

python -m mlx_lm.generate --model mistralai/Mistral-7B-v0.1 --prompt "hello"

This will download a Mistral 7B model from the Hugging Face Hub and generate text using the given prompt.

For a full list of options run:

python -m mlx_lm.generate --help

To quantize a model from the command line run:

python -m mlx_lm.convert --hf-path mistralai/Mistral-7B-v0.1 -q

For more options run:

python -m mlx_lm.convert --help

You can upload new models to Hugging Face by specifying --upload-repo to convert. For example, to upload a quantized Mistral-7B model to the MLX Hugging Face community you can do:

python -m mlx_lm.convert \
    --hf-path mistralai/Mistral-7B-v0.1 \
    -q \
    --upload-repo mlx-community/my-4bit-mistral

Supported Models

The example supports Hugging Face format Mistral, Llama, and Phi-2 style models. If the model you want to run is not supported, file an issue or better yet, submit a pull request.

Here are a few examples of Hugging Face models that work with this example:

Most Mistral, Llama, Phi-2, and Mixtral style models should work out of the box.

For some models (such as Qwen and plamo) the tokenizer requires you to enable the trust_remote_code option. You can do this by passing --trust-remote-code in the command line. If you don't specify the flag explicitly, you will be prompted to trust remote code in the terminal when running the model.

For Qwen models you must also specify the eos_token. You can do this by passing --eos-token "<|endoftext|>" in the command line.

These options can also be set in the Python API. For example:

model, tokenizer = load(
    "qwen/Qwen-7B",
    tokenizer_config={"eos_token": "<|endoftext|>", "trust_remote_code": True},
)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx-lm-0.0.10.tar.gz (25.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlx_lm-0.0.10-py3-none-any.whl (36.0 kB view details)

Uploaded Python 3

File details

Details for the file mlx-lm-0.0.10.tar.gz.

File metadata

  • Download URL: mlx-lm-0.0.10.tar.gz
  • Upload date:
  • Size: 25.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for mlx-lm-0.0.10.tar.gz
Algorithm Hash digest
SHA256 1e8b856106fc377177e04bc6344d204f2d5f19b2cde55acf84b4d4da1f4f4789
MD5 9f24646ff5f9c4a230c59c937237ec77
BLAKE2b-256 8edeb8d1d3a07837b87c2bc40b0217c41292e8307a7eb3134877340d952ca4ad

See more details on using hashes here.

File details

Details for the file mlx_lm-0.0.10-py3-none-any.whl.

File metadata

  • Download URL: mlx_lm-0.0.10-py3-none-any.whl
  • Upload date:
  • Size: 36.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for mlx_lm-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 08aa1c4337684248e2d273f262458aaa5f2104bb8bf48d8e3e39a5f59ddfe6b2
MD5 fd8f4810359e74174e6e5d33b6feb3fb
BLAKE2b-256 ea4907e63d3d57d48526db65a54ddf2e6dbe17705e3e644dcc1b2c4382dd0c48

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page