Support for MLX models in LLM

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

simonw

These details have not been verified by PyPI

Project description

llm-mlx

Support for MLX models in LLM.

Read my blog for background on this project.

Installation

Install this plugin in the same environment as LLM. This plugin likely only works on macOS.

llm install llm-mlx

This plugin depends on sentencepiece which does not yet publish a binary wheel for Python 3.13. You will find this plugin easier to run on Python 3.12 or lower. One way to install a version of LLM that uses Python 3.12 is like this, using uv:

uv tool install llm --python 3.12

See issue #7 for more on this.

Usage

To install an MLX model from Hugging Face, use the llm mlx download-model command. This example downloads 1.8GB of model weights from mlx-community/Llama-3.2-3B-Instruct-4bit:

llm mlx download-model mlx-community/Llama-3.2-3B-Instruct-4bit

Then run prompts like this:

llm -m mlx-community/Llama-3.2-3B-Instruct-4bit 'Capital of France?' -s 'you are a pelican'

The mlx-community organization is a useful source for compatible models.

Models to try

The following models all work well with this plugin:

mlx-community/gemma-3-27b-it-qat-4bit - 16GB
mlx-community/Qwen2.5-0.5B-Instruct-4bit - 278MB
mlx-community/Mistral-7B-Instruct-v0.3-4bit - 4.08GB
mlx-community/Mistral-Small-24B-Instruct-2501-4bit — 13.26 GB
mlx-community/DeepSeek-R1-Distill-Qwen-32B-4bit - 18.5GB
mlx-community/Llama-3.3-70B-Instruct-4bit - 40GB

Model options

MLX models can use the following model options:

-o max_tokens INTEGER: Maximum number of tokens to generate in the completion (defaults to 1024)
-o unlimited 1: Generate an unlimited number of tokens in the completion
-o temperature FLOAT: Sampling temperature (defaults to 0.8)
-o top_p FLOAT: Sampling top-p (defaults to 0.9)
-o min_p FLOAT: Sampling min-p (defaults to 0.1)
-o min_tokens_to_keep INT: Minimum tokens to keep for min-p sampling (defaults to 1)
-o seed INT: Random number seed to use

For example:

llm -m mlx-community/Llama-3.2-3B-Instruct-4bit 'Joke about pelicans' -o max_tokens 60 -o temperature 1.0

Managing existing models from your Hugging Face cache

If you have used MLX models in the past you may already have some installed in your ~/.cache/huggingface/hub directory.

The llm mlx manage-models command can detect these and provide you with the option to add them to the list of models registered with LLM.

llm mlx manage-models

This will open an interface like this one:

Available model files (↑/↓ to navigate, SPACE to select, ENTER to confirm, Ctrl+C to quit):
  ○ Unregister mlx-community/gemma-3-27b-it-qat-4bit (gemma3)
  ○ Register mlx-community/DeepSeek-R1-Distill-Llama-8B (llama)
> ○ Register mlx-community/Llama-3.2-3B-Instruct-4bit (llama)
  ○ Unregister mlx-community/SmolLM-135M-Instruct-4bit (llama)
  ○ Register mlx-community/nanoLLaVA-1.5-8bit (llava-qwen2)
  ○ Register mlx-community/Mistral-Small-3.1-Text-24B-Instruct-2503-8bit (mistral)
  ○ Unregister mlx-community/OLMo-2-0325-32B-Instruct-4bit (olmo2)
  ○ Unregister mlx-community/OpenELM-270M-Instruct (openelm)
  ○ Unregister mlx-community/DeepCoder-14B-Preview-4bit (qwen2)
  ○ Unregister mlx-community/Qwen2.5-0.5B-Instruct-4bit (qwen2)

Navigate and , hit <space> to select actions you want to take and then hit <enter> to confirm. You can use this interface to both register new models and unregister existing models.

This tool only changes the list of available models recorded in your ~/Library/Application Support/io.datasette.llm/llm-mlx.json file. It does not delete any model files from your Hugging Face cache.

Using models from Python

If you have registered models with the llm download-model command you can use in Python like this:

import llm
model = llm.get_model("mlx-community/Llama-3.2-3B-Instruct-4bit")
print(model.prompt("hi").text())

You can avoid that registration step entirely by accessing the models like this instead:

from llm_mlx import MlxModel
model = MlxModel("mlx-community/Llama-3.2-3B-Instruct-4bit")
print(model.prompt("hi").text())
# Outputs: How can I assist you today?

The LLM Python API documentation has more details on how to use LLM models.

Development

To set up this plugin locally, first checkout the code. Then create a new virtual environment:

cd llm-mlx
python -m venv venv
source venv/bin/activate

Now install the dependencies and test dependencies:

llm install -e '.[test]'

To run the tests:

python -m pytest

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

simonw

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.4

Apr 23, 2025

0.3

Feb 17, 2025

0.2.1

Feb 15, 2025

0.2

Feb 15, 2025

0.1

Feb 15, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_mlx-0.4.tar.gz (11.1 kB view details)

Uploaded Apr 23, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_mlx-0.4-py3-none-any.whl (11.2 kB view details)

Uploaded Apr 23, 2025 Python 3

File details

Details for the file llm_mlx-0.4.tar.gz.

File metadata

Download URL: llm_mlx-0.4.tar.gz
Upload date: Apr 23, 2025
Size: 11.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for llm_mlx-0.4.tar.gz
Algorithm	Hash digest
SHA256	`ee3b1fb203c9bf18fe6a4b39d8a8787a2967429102baaf51f36d8090a9b257ba`
MD5	`eb773344cb23af9d2e40dd1d8553b016`
BLAKE2b-256	`8a295d8915e0896369e181015478d101f6a98b4bd5ff6c89f9af105a740e99a0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_mlx-0.4.tar.gz:

Publisher: publish.yml on simonw/llm-mlx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llm_mlx-0.4.tar.gz
- Subject digest: ee3b1fb203c9bf18fe6a4b39d8a8787a2967429102baaf51f36d8090a9b257ba
- Sigstore transparency entry: 201573071
- Sigstore integration time: Apr 23, 2025
Source repository:
- Permalink: simonw/llm-mlx@b477833b807143241220f6561742833070d907cc
- Branch / Tag: refs/tags/0.4
- Owner: https://github.com/simonw
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@b477833b807143241220f6561742833070d907cc
- Trigger Event: release

File details

Details for the file llm_mlx-0.4-py3-none-any.whl.

File metadata

Download URL: llm_mlx-0.4-py3-none-any.whl
Upload date: Apr 23, 2025
Size: 11.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for llm_mlx-0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`42b433a2e98eaaf3070d9186f4e961a5990507b73b848aa684612723afc3df34`
MD5	`42150290e53c253d7fa146f196dc0902`
BLAKE2b-256	`7a9a94462ce770582f95661914e16709b75b05957eca7d02c0c8c9a13b8b2c3b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_mlx-0.4-py3-none-any.whl:

Publisher: publish.yml on simonw/llm-mlx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llm_mlx-0.4-py3-none-any.whl
- Subject digest: 42b433a2e98eaaf3070d9186f4e961a5990507b73b848aa684612723afc3df34
- Sigstore transparency entry: 201573073
- Sigstore integration time: Apr 23, 2025
Source repository:
- Permalink: simonw/llm-mlx@b477833b807143241220f6561742833070d907cc
- Branch / Tag: refs/tags/0.4
- Owner: https://github.com/simonw
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@b477833b807143241220f6561742833070d907cc
- Trigger Event: release

llm-mlx 0.4

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

llm-mlx

Installation

Usage

Models to try

Model options

Managing existing models from your Hugging Face cache

Using models from Python

Development

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance