LLMs on Apple silicon with MLX and the Hugging Face Hub
Project description
Generate Text with LLMs and MLX
The easiest way to get started is to install the mlx-lm
package:
pip install mlx-lm
Python API
You can use mlx-lm
as a module:
from mlx_lm import load, generate
model, tokenizer = load("mistralai/Mistral-7B-v0.1")
response = generate(model, tokenizer, prompt="hello", verbose=True)
To see a description of all the arguments you can do:
>>> help(generate)
The mlx-lm
package also comes with functionality to quantize and optionally
upload models to the Hugging Face Hub.
You can convert models in the Python API with:
from mlx_lm import convert
upload_repo = "mlx-community/My-Mistral-7B-v0.1-4bit"
convert("mistralai/Mistral-7B-v0.1", quantize=True, upload_repo=upload_repo)
This will generate a 4-bit quantized Mistral-7B and upload it to the
repo mlx-community/My-Mistral-7B-v0.1-4bit
. It will also save the
converted model in the path mlx_model
by default.
To see a description of all the arguments you can do:
>>> help(convert)
Command Line
You can also use mlx-lm
from the command line with:
python -m mlx_lm.generate --model mistralai/Mistral-7B-v0.1 --prompt "hello"
This will download a Mistral 7B model from the Hugging Face Hub and generate text using the given prompt.
For a full list of options run:
python -m mlx_lm generate --help
To quantize a model from the command line run:
python -m mlx_lm.convert --hf-path mistralai/Mistral-7B-v0.1 -q
For more options run:
python -m mlx_lm.convert --help
You can upload new models to Hugging Face by specifying --upload-repo
to
convert
. For example, to upload a quantized Mistral-7B model to the
MLX Hugging Face community you can do:
python -m mlx_lm.convert \
--hf-path mistralai/Mistral-7B-v0.1 \
-q \
--upload-repo mlx-community/my-4bit-mistral \
Supported Models
The example supports Hugging Face format Mistral, Llama, and Phi-2 style models. If the model you want to run is not supported, file an issue or better yet, submit a pull request.
Here are a few examples of Hugging Face models that work with this example:
- mistralai/Mistral-7B-v0.1
- meta-llama/Llama-2-7b-hf
- deepseek-ai/deepseek-coder-6.7b-instruct
- 01-ai/Yi-6B-Chat
- microsoft/phi-2
Most Mistral, Llama, and Phi-2 style models should work out of the box.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mlx-lm-0.0.2.tar.gz
.
File metadata
- Download URL: mlx-lm-0.0.2.tar.gz
- Upload date:
- Size: 10.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
3aedc79f313cfa47cdc904305475728c0128f30ad411e5f0108e4baf46259640
|
|
MD5 |
1a5d4648a997996c51905bb4334d84b1
|
|
BLAKE2b-256 |
58462558e5de26c9bb19e0048348ea7be9f3403f9c9d59bc4c2c9a0b05abf489
|
File details
Details for the file mlx_lm-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: mlx_lm-0.0.2-py3-none-any.whl
- Upload date:
- Size: 11.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
665ab1f13b73e00131d8c42c02d7fd0bd71a8787f3d858ca299ccca602226dbd
|
|
MD5 |
65dda399aabbd821f35ddf0892e56945
|
|
BLAKE2b-256 |
56d4690669e5b150f200b02d8055f7a2962d500e4f7ab2ac86ca6ecee0b4c4ad
|