Large Language Models (LLMs) applications and tools running on Apple Silicon in real-time with Apple MLX
Project description
mlx-llm
Large Language Models (LLMs) applications and tools running on Apple Silicon in real-time with Apple MLX.
Go to the entire Youtube Video.
How to install 🔨
pip install mlx-llm
Models 🧠
Currently, out-of-the-box supported models are:
| Family | Models |
|---|---|
| LLaMA 2 | llama_2_7b_chat_hf, llama_2_7b_hf |
| LLaMA 3 | deepseek_r1_distill_llama_8b, llama_3_8b, llama_3_8b_instruct, hermes_2_pro_llama_3_8b, llama_3_2_1b_instruct, llama_3_2_3b_instruct |
| Phi3 | phi_3_mini_4k_instruct, phi_3_mini_128k_instruct, phi_3.5_mini_instruct |
| Mistral | mistral_7b_instruct_v0.2, openhermes_2.5_mistral_7b, starling_lm_7b_beta |
| TinyLLaMA | tiny_llama_1.1B_chat_v1.0 |
| Gemma | gemma_1.1_2b_it, gemma_1.1_7b_it, gemma_2_2b_it, gemma_2_9b_it |
| OpenELM | openelm_270M_instruct, openelm_450M_instruct, openelm_1.1B_instruct, openelm_3B_instruct |
| SmolLM2 | smollm2_1.7B_instruct, smollm2_360M_instruct, smollm2_135M_instruct |
To create a model with pre-trained weights from HuggingFace:
from mlx_llm.model import create_model
# loading weights from HuggingFace
model = create_model("deepseek_r1_distill_llama_8b")
You can also load a new version of pre-trained weights for a specific model directly from HuggingFace:
- set
weightsby addinghf://before the HuggingFace repository - if necessary, specify custom model configs (rope_theta, rope_traditional, vocab_size, norm_eps)
Here's an example of how to to it:
from mlx_llm.model import create_model
# an example of loading new weights from HuggingFace
model = create_model(
model_name="openelm_1.1B_instruct", # it's the base model
weights="hf://apple/OpenELM-1.1B", # new weights from HuggingFace
)
# an example of loading new weights from HuggingFace with custom model configs
model = create_model(
model_name="llama_3_8b_instruct", # it's the base model
weights="hf://gradientai/Llama-3-8B-Instruct-262k", # new weights from HuggingFace
model_config={
"rope_theta": 207112184.0
}
)
Quantization 📉
To quantize a model and save its weights just use:
from mlx_llm.model import create_model, quantize, get_weights
from mlx_llm.utils.weights import save_weights
# create the model from original weights
model = create_model("llama_3_8b_instruct")
# quantize the model
model = quantize(model, group_size=64, bits=4)
# getting weights dict (similar to state_dict in PyTorch)
weights = get_weights(model)
# save the model
save_weights(weights, "llama_3_8b_instruct-4bit.safetensors")
Model Embeddings ✴️
Models in mlx-llm are able to extract embeddings from a given text.
import mlx.core as mx
from mlx_llm.model import create_model, create_tokenizer
model = create_model("llama_3_8b_instruct")
tokenizer = create_tokenizer('llama_3_8b_instruct')
text = ["I like to play basketball", "I like to play tennis"]
tokens = tokenizer(text)
x = mx.array(tokens["input_ids"])
embeds, _ = model.embed(x, norm=True)
Chat with LLM 📱
mlx-llm comes with tools to easily run your LLM chat on Apple Silicon.
To chat with an LLM provide:
- a system prompt --> to set the overall tone of the LLM
- optional previous interactions to set the mood of the conversation
from mlx_llm.chat import ChatSetup, LLMChat
model_name = "tiny_llama_1.1B_chat_v1.0"
chat = LLMChat(
model_name=model_name,
chat_setup=ChatSetup(
system="You are Michael Scott from The Office. Your goal is to answer like him, so be funny and inappropriate, but be brief.",
history=[
{"question": "What is your name?", "answer": "Michael Scott"},
{"question": "What is your favorite episode of The Office?", "answer": "The Dinner Party"},
],
),
quantized=False, # if you want it faster use the quantization params (e.g., group_size=64, bits=4)
)
chat.start()
[!WARNING] OpenELM chat-mode is broken. I am working on fixing it.
📧 Contact
If you have any questions, please email riccardomusmeci92@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlx_llm-1.0.9.tar.gz.
File metadata
- Download URL: mlx_llm-1.0.9.tar.gz
- Upload date:
- Size: 33.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
722828d7a3f4a0e1a4f5d1eb5d4d2446794d36ea3ad5a53754dff46376b0b322
|
|
| MD5 |
d850d04de7f8f6507440c3018a0afdcd
|
|
| BLAKE2b-256 |
d80e86381035604bc686c191cd5114077877effd25ea597265855653c003ccdc
|
File details
Details for the file mlx_llm-1.0.9-py3-none-any.whl.
File metadata
- Download URL: mlx_llm-1.0.9-py3-none-any.whl
- Upload date:
- Size: 43.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a8d1db9d4b643864d0e8dcaf0fb432d4cad5a1917d83ceaf1a3d0f0e52481e5
|
|
| MD5 |
8c070b605f14f10b2244d0de2671933e
|
|
| BLAKE2b-256 |
54ad4ab9990dc433eacf84765a83735fd27aaec07a56330bd3913a6af85ddf06
|