MLX transformers is a machine learning framework with similar Interface to Huggingface transformers using MLX core as backend.
Project description
MLX Transformers
MLX Transformers is a library that provides model implementation in MLX. It uses a similar model interface as HuggingFace Transformers and provides a way to load and use models in Apple Silicon devices. Implemented models have the same modules
MLX transformers is currently only available for infernce on Apple Silicon devices. Training support will be added in the future.
Installation
This library is available on PyPI and can be installed using pip:
pip install mlx-transformers
Quick Tour
A list of the available models can be found in the mlx_transformers.models
module and are also listed in the section below. The following example demonstrates how to load a model and use it for inference:
-
You can load the model using MLX transformers in few lines of code
from transformers import BertConfig, BertTokenizer from mlx_transformers.models import BertForMaskedLM as MLXBertForMaskedLM tokenizer = BertTokenizer.from_pretrained("bert-base-uncased") config = BertConfig.from_pretrained("bert-base-uncased") model = MLXBertForMaskedLM(config) model.from_pretrained("bert-base-uncased") sample_input = "Hello, world!" inputs = tokenizer(sample_input, return_tensors="np") outputs = model(**inputs)
Sentence Transformer Example
import mlx.core as mx
import numpy as np
from transformers import AutoConfig, AutoTokenizer
from mlx_transformers.models import BertModel as MLXBertModel
def _mean_pooling(last_hidden_state: mx.array, attention_mask: mx.array):
token_embeddings = last_hidden_state
input_mask_expanded = mx.expand_dims(attention_mask, -1)
input_mask_expanded = mx.broadcast_to(input_mask_expanded, token_embeddings.shape).astype(mx.float32)
sum_embeddings = mx.sum(token_embeddings * input_mask_expanded, 1)
sum_mask = mx.clip(input_mask_expanded.sum(axis=1), 1e-9, None)
return sum_embeddings / sum_mask
sentences = ['This is an example sentence', 'Each sentence is converted']
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
config = AutoConfig.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
model = MLXBertModel(config)
model.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
inputs = tokenizer(sentences, return_tensors="np", padding=True, truncation=True)
outputs = model(**inputs)
sentence_embeddings = _mean_pooling(outputs.last_hidden_state, inputs.attention_mask)
Available Models
The following models have been ported to MLX Transformers from Huggingface for inference:
- Bert
- Roberta
- XLMRoberta
- M2M100
- Sentence Transformers
- Llama
- CLIP -> Coming soon...
- T5 -> Coming soon...
Examples
The examples
directory contains a few examples that demonstrate how to use the models in MLX Transformers.
- LLama Example
python3 examples/llama_generation.py --model-name "meta-llama/Llama-2-7b-hf" --model-path meta-llama-Llama-2-7b-hf.npz
Benchmarks
Coming soon...
Contributions
Contributions to MLX transformers are welcome. See the contributing documentation for instructions on setting up a development environment.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for mlx_transformers-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eb9ff057003be5700b100057308664fed1e391d1f1556d52056f9bf970c1a4a5 |
|
MD5 | 53fc8f55b09d7a6470bb90dced76ebaa |
|
BLAKE2b-256 | eb602b64b35ac85fcc197b63a0a08696b6dfab0f944e3c5b2d90108b942a4bf0 |