Large Language Models (LLMs) applications and tools running on Apple Silicon in real-time with Apple MLX

These details have not been verified by PyPI

Project description

mlx-llm

Large Language Models (LLMs) applications and tools running on Apple Silicon in real-time with Apple MLX.

Alt Text

How to install 🔨

pip install mlx-llm

Models 🧠

Currently, out-of-the-box supported models are:

Family	Models
LLaMA 2	llama_2_7b_chat_hf, llama_2_7b_hf
LLaMA 3	deepseek_r1_distill_llama_8b, llama_3_8b, llama_3_8b_instruct, hermes_2_pro_llama_3_8b, llama_3_2_1b_instruct, llama_3_2_3b_instruct
Phi3	phi_3_mini_4k_instruct, phi_3_mini_128k_instruct, phi_3.5_mini_instruct
Mistral	mistral_7b_instruct_v0.2, openhermes_2.5_mistral_7b, starling_lm_7b_beta
TinyLLaMA	tiny_llama_1.1B_chat_v1.0
Gemma	gemma_1.1_2b_it, gemma_1.1_7b_it, gemma_2_2b_it, gemma_2_9b_it
OpenELM	openelm_270M_instruct, openelm_450M_instruct, openelm_1.1B_instruct, openelm_3B_instruct
SmolLM2	smollm2_1.7B_instruct, smollm2_360M_instruct, smollm2_135M_instruct

To create a model with pre-trained weights from HuggingFace:

from mlx_llm.model import create_model

# loading weights from HuggingFace
model = create_model("deepseek_r1_distill_llama_8b")

You can also load a new version of pre-trained weights for a specific model directly from HuggingFace:

set weights by adding hf:// before the HuggingFace repository
if necessary, specify custom model configs (rope_theta, rope_traditional, vocab_size, norm_eps)

Here's an example of how to to it:

from mlx_llm.model import create_model

# an example of loading new weights from HuggingFace
model = create_model(
    model_name="openelm_1.1B_instruct", # it's the base model
    weights="hf://apple/OpenELM-1.1B", # new weights from HuggingFace
)

# an example of loading new weights from HuggingFace with custom model configs
model = create_model(
    model_name="llama_3_8b_instruct", # it's the base model
    weights="hf://gradientai/Llama-3-8B-Instruct-262k", # new weights from HuggingFace
    model_config={
        "rope_theta": 207112184.0
    }
)

Quantization 📉

To quantize a model and save its weights just use:

from mlx_llm.model import create_model, quantize, get_weights
from mlx_llm.utils.weights import save_weights

# create the model from original weights
model = create_model("llama_3_8b_instruct")
# quantize the model
model = quantize(model, group_size=64, bits=4)
# getting weights dict (similar to state_dict in PyTorch)
weights = get_weights(model)
# save the model
save_weights(weights, "llama_3_8b_instruct-4bit.safetensors")

Model Embeddings ✴️

Models in mlx-llm are able to extract embeddings from a given text.

import mlx.core as mx
from mlx_llm.model import create_model, create_tokenizer

model = create_model("llama_3_8b_instruct")
tokenizer = create_tokenizer('llama_3_8b_instruct')
text = ["I like to play basketball", "I like to play tennis"]
tokens = tokenizer(text)
x = mx.array(tokens["input_ids"])
embeds, _ = model.embed(x, norm=True)

Chat with LLM 📱

mlx-llm comes with tools to easily run your LLM chat on Apple Silicon.

To chat with an LLM provide:

a system prompt --> to set the overall tone of the LLM
optional previous interactions to set the mood of the conversation

from mlx_llm.chat import ChatSetup, LLMChat

model_name = "tiny_llama_1.1B_chat_v1.0"

chat = LLMChat(
    model_name=model_name,
    chat_setup=ChatSetup(
        system="You are Michael Scott from The Office. Your goal is to answer like him, so be funny and inappropriate, but be brief.",
        history=[
            {"question": "What is your name?", "answer": "Michael Scott"},
            {"question": "What is your favorite episode of The Office?", "answer": "The Dinner Party"},
        ],
    ),
    quantized=False, # if you want it faster use the quantization params (e.g., group_size=64, bits=4)
)

chat.start()

[!WARNING] OpenELM chat-mode is broken. I am working on fixing it.

📧 Contact

If you have any questions, please email riccardomusmeci92@gmail.com

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.9

Jan 29, 2025

1.0.8

Jan 10, 2025

1.0.7

Aug 24, 2024

1.0.6

May 4, 2024

1.0.5

Apr 30, 2024

1.0.4

Apr 30, 2024

1.0.3

Apr 26, 2024

1.0.2

Apr 26, 2024

1.0.1

Apr 26, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_llm-1.0.9.tar.gz (33.0 kB view details)

Uploaded Jan 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlx_llm-1.0.9-py3-none-any.whl (43.7 kB view details)

Uploaded Jan 29, 2025 Python 3

File details

Details for the file mlx_llm-1.0.9.tar.gz.

File metadata

Download URL: mlx_llm-1.0.9.tar.gz
Upload date: Jan 29, 2025
Size: 33.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.10.13

File hashes

Hashes for mlx_llm-1.0.9.tar.gz
Algorithm	Hash digest
SHA256	`722828d7a3f4a0e1a4f5d1eb5d4d2446794d36ea3ad5a53754dff46376b0b322`
MD5	`d850d04de7f8f6507440c3018a0afdcd`
BLAKE2b-256	`d80e86381035604bc686c191cd5114077877effd25ea597265855653c003ccdc`

See more details on using hashes here.

File details

Details for the file mlx_llm-1.0.9-py3-none-any.whl.

File metadata

Download URL: mlx_llm-1.0.9-py3-none-any.whl
Upload date: Jan 29, 2025
Size: 43.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.10.13

File hashes

Hashes for mlx_llm-1.0.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4a8d1db9d4b643864d0e8dcaf0fb432d4cad5a1917d83ceaf1a3d0f0e52481e5`
MD5	`8c070b605f14f10b2244d0de2671933e`
BLAKE2b-256	`54ad4ab9990dc433eacf84765a83735fd27aaec07a56330bd3913a6af85ddf06`

See more details on using hashes here.

mlx-llm 1.0.9

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

mlx-llm

How to install 🔨

Models 🧠

Quantization 📉

Model Embeddings ✴️

Chat with LLM 📱

📧 Contact

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes