Skip to main content

Vision LLMs on Apple silicon with MLX and the Hugging Face Hub

Project description

MLX-VLM

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.

Table of Contents

Installation

The easiest way to get started is to install the mlx-vlm package using pip:

pip install mlx-vlm

Usage

Command Line Interface (CLI)

Generate output from a model using the CLI:

python -m mlx_vlm.generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --temp 0.0 --image http://images.cocodataset.org/val2017/000000039769.jpg

Chat UI with Gradio

Launch a chat interface using Gradio:

python -m mlx_vlm.chat_ui --model mlx-community/Qwen2-VL-2B-Instruct-4bit

Python Script

Here's an example of how to use MLX-VLM in a Python script:

import mlx.core as mx
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

# Load the model
model_path = "mlx-community/Qwen2-VL-2B-Instruct-4bit"
model, processor = load(model_path)
config = load_config(model_path)

# Prepare input
image = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
prompt = "Describe this image."

# Apply chat template
formatted_prompt = apply_chat_template(
    processor, config, prompt, num_images=len(image)
)

# Generate output
output = generate(model, processor, formatted_prompt, image, verbose=False)
print(output)

Multi-Image Chat Support

MLX-VLM supports analyzing multiple images simultaneously with select models. This feature enables more complex visual reasoning tasks and comprehensive analysis across multiple images in a single conversation.

Supported Models

The following models support multi-image chat:

  1. Idefics 2
  2. LLaVA (Interleave)
  3. Qwen2-VL
  4. Phi3-Vision
  5. Pixtral

Usage Examples

Python Script

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

model_path = "mlx-community/Qwen2-VL-2B-Instruct-4bit"
model, processor = load(model_path)
config = load_config(model_path)

images = ["path/to/image1.jpg", "path/to/image2.jpg"]
prompt = "Compare these two images."

formatted_prompt = apply_chat_template(
    processor, config, prompt, num_images=len(images)
)

output = generate(model, processor, formatted_prompt, images, verbose=False)
print(output)

Command Line

python -m mlx_vlm.generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --prompt "Compare these images" --image path/to/image1.jpg path/to/image2.jpg

Video Understanding

MLX-VLM also supports video analysis such as captioning, summarization, and more, with select models.

Supported Models

The following models support video chat:

  1. Qwen2-VL
  2. Qwen2.5-VL
  3. Idefics3
  4. LLaVA

With more coming soon.

Usage Examples

Command Line

python -m mlx_vlm.video_generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --prompt "Describe this video" --video path/to/video.mp4 --max-pixels 224 224 --fps 1.0

These examples demonstrate how to use multiple images with MLX-VLM for more complex visual reasoning tasks.

Fine-tuning

MLX-VLM supports fine-tuning models with LoRA and QLoRA.

LoRA & QLoRA

To learn more about LoRA, please refer to the LoRA.md file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_vlm-0.1.14.tar.gz (135.9 kB view details)

Uploaded Source

Built Distribution

mlx_vlm-0.1.14-py3-none-any.whl (178.7 kB view details)

Uploaded Python 3

File details

Details for the file mlx_vlm-0.1.14.tar.gz.

File metadata

  • Download URL: mlx_vlm-0.1.14.tar.gz
  • Upload date:
  • Size: 135.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for mlx_vlm-0.1.14.tar.gz
Algorithm Hash digest
SHA256 f0bd2d298d74a1c2dbc0c48e80dafb8684f42ca9c41a90c46191bebc2aa97723
MD5 5efddb1221f22b87fe401ccdb71cd138
BLAKE2b-256 02a8ad935824f8fa4fc4bb3ec3c4e8eb6cfc63e48f43a81b38ef2e6c0c8a7781

See more details on using hashes here.

File details

Details for the file mlx_vlm-0.1.14-py3-none-any.whl.

File metadata

  • Download URL: mlx_vlm-0.1.14-py3-none-any.whl
  • Upload date:
  • Size: 178.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for mlx_vlm-0.1.14-py3-none-any.whl
Algorithm Hash digest
SHA256 9324711b3f178c38501f60cd88cc10e751db66e9a87785e17fcd307225e1e502
MD5 82be5c68b8f6e78fc833cb51f76c9d23
BLAKE2b-256 b12e7c41547bdf6727ff23f510cd173c61a25622d0ac89a21a73b97d75d2c0d4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page