Vision LLMs on Apple silicon with MLX and the Hugging Face Hub
Project description
MLX-VLM
MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
Table of Contents
Installation
The easiest way to get started is to install the mlx-vlm
package using pip:
pip install mlx-vlm
Usage
Command Line Interface (CLI)
Generate output from a model using the CLI:
python -m mlx_vlm.generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --temp 0.0 --image http://images.cocodataset.org/val2017/000000039769.jpg
Chat UI with Gradio
Launch a chat interface using Gradio:
python -m mlx_vlm.chat_ui --model mlx-community/Qwen2-VL-2B-Instruct-4bit
Python Script
Here's an example of how to use MLX-VLM in a Python script:
import mlx.core as mx
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
# Load the model
model_path = "mlx-community/Qwen2-VL-2B-Instruct-4bit"
model, processor = load(model_path)
config = load_config(model_path)
# Prepare input
image = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
prompt = "Describe this image."
# Apply chat template
formatted_prompt = apply_chat_template(
processor, config, prompt, num_images=len(image)
)
# Generate output
output = generate(model, processor, formatted_prompt, image, verbose=False)
print(output)
Multi-Image Chat Support
MLX-VLM supports analyzing multiple images simultaneously with select models. This feature enables more complex visual reasoning tasks and comprehensive analysis across multiple images in a single conversation.
Supported Models
The following models support multi-image chat:
- Idefics 2
- LLaVA (Interleave)
- Qwen2-VL
- Phi3-Vision
- Pixtral
Usage Examples
Python Script
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
model_path = "mlx-community/Qwen2-VL-2B-Instruct-4bit"
model, processor = load(model_path)
config = load_config(model_path)
images = ["path/to/image1.jpg", "path/to/image2.jpg"]
prompt = "Compare these two images."
formatted_prompt = apply_chat_template(
processor, config, prompt, num_images=len(images)
)
output = generate(model, processor, formatted_prompt, images, verbose=False)
print(output)
Command Line
python -m mlx_vlm.generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --prompt "Compare these images" --image path/to/image1.jpg path/to/image2.jpg
Video Understanding
MLX-VLM also supports video analysis such as captioning, summarization, and more, with select models.
Supported Models
The following models support video chat:
- Qwen2-VL
- Qwen2.5-VL
- Idefics3
- LLaVA
With more coming soon.
Usage Examples
Command Line
python -m mlx_vlm.video_generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --prompt "Describe this video" --video path/to/video.mp4 --max-pixels 224 224 --fps 1.0
These examples demonstrate how to use multiple images with MLX-VLM for more complex visual reasoning tasks.
Fine-tuning
MLX-VLM supports fine-tuning models with LoRA and QLoRA.
LoRA & QLoRA
To learn more about LoRA, please refer to the LoRA.md file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mlx_vlm-0.1.14.tar.gz
.
File metadata
- Download URL: mlx_vlm-0.1.14.tar.gz
- Upload date:
- Size: 135.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f0bd2d298d74a1c2dbc0c48e80dafb8684f42ca9c41a90c46191bebc2aa97723 |
|
MD5 | 5efddb1221f22b87fe401ccdb71cd138 |
|
BLAKE2b-256 | 02a8ad935824f8fa4fc4bb3ec3c4e8eb6cfc63e48f43a81b38ef2e6c0c8a7781 |
File details
Details for the file mlx_vlm-0.1.14-py3-none-any.whl
.
File metadata
- Download URL: mlx_vlm-0.1.14-py3-none-any.whl
- Upload date:
- Size: 178.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9324711b3f178c38501f60cd88cc10e751db66e9a87785e17fcd307225e1e502 |
|
MD5 | 82be5c68b8f6e78fc833cb51f76c9d23 |
|
BLAKE2b-256 | b12e7c41547bdf6727ff23f510cd173c61a25622d0ac89a21a73b97d75d2c0d4 |