Skip to main content

Running and training LLMs on Apple Silicon via MLX

Project description

sillm

SiLLM - Silicon LLM Training & Inference Toolkit

SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework. Building upon the foundation provided by MLX Examples, this project introduces additional features specifically designed to enhance LLM operations with MLX in a streamlined package.

  • LLM Loading: load LLMs for chat and training in different formats (Huggingface, Torch, GGUF, MLX)
  • LoRA Training: train LLMs using Low-rank Adaptation
  • DPO Training: train LLMs with Direct Preference Optimization

Features

  • Web app for a seamless chat experience running on local hardware
  • API server with OpenAI compatible chat endpoints
  • Model architectures: Llama, Mistral, Mixtral, Phi-2, Phi-3, Gemma, Qwen2, Starcoder2, DBRX, Cohere Command-R
  • Conversation templates: llama-2, chatml, alpaca, vicuna, gemma, phi, openchat
  • Loss functions for DPO: sigmoid, hinge, IPO, DPOP
  • Training loss plots using matplotlib
  • Perplexity calculation

Experimental

One of the main goals of SiLLM is to enable experimentation with the inner workings of large language models and make new techniques accessible to a wider audience running on Apple Silicon hardware.

Control vectors and feature ablation

The control module incorporates techniques based on the paper Representation Engineering and the blog Refusal Ablation. Representation engineering is a method to calculate control vectors from a model's hidden states during training that can be used to influence the behavior and generated output during inference. Refusal ablation works similarly, but can be used to remove the direction represented by the vector from model weights.

Installation

Using pip:

pip install sillm-mlx

Usage

Chat web application

The web app uses Chainlit to provide a frontend for conversational AI running locally on Apple Silicon hardware.

https://github.com/armbues/SiLLM/assets/4117144/ab537795-5020-4241-aa89-3b19b9de263b

To use the web app, clone the repository and start the app using chainlit:

git clone https://github.com/armbues/SiLLM.git
cd SiLLM/app
pip install -r requirements.txt
python -m chainlit run app.py -w

Set the environment variables SILLM_MODEL_DIR and SILLM_ADAPTER_DIR to load local models/adapters.

Command-line interface (CLI) scripts

Run the CLI scripts with the argument -h to see a print-out of all available arguments.

Chat:

Simple CLI interface for chatting with an LLM in the terminal.

python -m sillm.chat /path/to/model

Running sillm.chat in the terminal with Gemma-2B-it on a MacBook Air M2 with 16GB memory:

https://github.com/armbues/SiLLM/assets/4117144/42e2d0f8-3bd8-44ca-9f78-8c4a885b8939

Server:

Run an API server with basic functionality compatible with OpenAI compatible chat endpoints.

python -m sillm.server /path/to/model --port 8000

LoRA Fine-tuning:

Fine-tune a model with low-rank adaptation (LoRA).

python -m sillm.lora /path/to/model -d /path/to/dataset -o /output/adapters

DPO Fine-tuning:

Fine-tune a model with LoRA and direct preference optimization (DPO).

python -m sillm.dpo /path/to/model -d /path/to/dataset -o /output/adapters

Conversion

Convert a model while merging adapters or quantizing the weights.

Example of merging an adapter into a model:

python -m sillm.convert /path/to/input/model /path/to/output/model -a /path/to/adapters

Quantization

Quantize a model serially (without loading it entirely into memory):

python -m sillm.quantize /path/to/input/model /path/to/output/model --bits 4

Python

Minimal example of loading a model with SiLLM and generating a text completion:

import sillm

model = sillm.load("/path/to/model")
for s, _ in model.generate("On a beautiful Sunday morning,"):
    print(s, flush=True, end="")

Examples

The repository SiLLM-examples contains Python code examples for using the SiLLM framework for training and running LLMs.

LoRA Fine-tuning

LoRA training Mistral-7B-Instruct-v0.2 with the Nvidia HelpSteer dataset.

DPO Fine-tuning

DPO training Qwen1.5-7B-Chat with the DPO Mix 7K dataset. The training consists of a supervised fine tuning (SFT) followed by direct preference optimization (DPO).

MMLU Benchmark

Implementation of the "Massive Multitask Language Understanding" benchmark using the MMLU dataset.

Perplexity

Calculating perplexity scores for a sample dataset of entry paragraphs from Wikipedia articles.

Model Support

SiLLM generally supports loading LLMs of the following model architectures/families: Llama 2, Mistral, Mixtral, Gemma, Phi, Qwen 2, StarCoder2.

Here is a list of models that were successfully tested with SiLLM:

Model Family Models/Sizes (HF) Models/Sizes (GGUF) Models/Sizes (MLX)
Llama-3 8B-Instruct, 70B-Instruct
Llama-2 7b-chat 7b-chat.Q8_0, 13b-chat.Q8_0 7b, 7b-chat
Mistral 7b-instruct-v0.2, 7b-instruct-v0.3 7b-instruct-v0.2.Q8_0
Mixtral 8x7B-Instruct-v0.1 , 8x22B-Instruct-v0.1
Gemma 2b, 2b-it, 7b, 7b-it
Phi-2 2.7b
Phi-3 mini-4k
Qwen 1.5 7b-chat, 14b-chat
Qwen 2 7b-instruct, 72b-instruct
StarCoder2 3b, 7b, 15b
CodeLlama 70b-instruct.Q4_0, Phind-34b-v2.Q4_0
Codestral 22b-v0.1
DBRX (currently not supported) dbrx-instruct-4bit
Cohere Command-R, Command-R+

Roadmap

  • Learning rate schedulers for training
  • Merging models
  • Saving models to GGUF
  • Fine tuning with ORPO

License

This project uses the MIT License.

Acknowledgments

Big thanks to the Apple MLX team for implementing and maintaining the MLX framework that makes it possible to unlock the power of Apple Silicon and run/train LLMs on MacBooks and other Apple devices. Thank you to all the contributors of the MLX Examples project and developers sharing model implementations online. Last but not least, thank you to the larger community sharing open weights models, fine tunes, and datasets - without you all the gen AI progress would happen behind locked doors!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sillm-mlx-0.1.5.tar.gz (49.9 kB view details)

Uploaded Source

Built Distribution

sillm_mlx-0.1.5-py3-none-any.whl (67.6 kB view details)

Uploaded Python 3

File details

Details for the file sillm-mlx-0.1.5.tar.gz.

File metadata

  • Download URL: sillm-mlx-0.1.5.tar.gz
  • Upload date:
  • Size: 49.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.13

File hashes

Hashes for sillm-mlx-0.1.5.tar.gz
Algorithm Hash digest
SHA256 a3ef98f7431e18457da9fd5a0fc0bfd23150c8fbd5d80ff89dee375e09884c4f
MD5 36a876768d136a97b4e1e9eaf6195d00
BLAKE2b-256 e8577c1062dd57fcc02fe9b78a3954992e05fb9b9405790e87e1752f51bbac17

See more details on using hashes here.

File details

Details for the file sillm_mlx-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: sillm_mlx-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 67.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.13

File hashes

Hashes for sillm_mlx-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 6cbbac56aa89de40abee7498f04b91f37a7fd2e9627a74bad58801d1d78ceba7
MD5 daf5b5ac0117ff1ecb42b7e4d96004b0
BLAKE2b-256 91bec0255350bd9f5dea72dbdc504add9f0f21b2b6c29163d04bcaf8b619d660

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page