Skip to main content

Vision LLMs on Apple silicon with MLX and the Hugging Face Hub

Project description

MLX-VLM

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.

Table of Contents

Installation

The easiest way to get started is to install the mlx-vlm package using pip:

pip install mlx-vlm

Usage

Command Line Interface (CLI)

Generate output from a model using the CLI:

python -m mlx_vlm.generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --temp 0.0 --image http://images.cocodataset.org/val2017/000000039769.jpg

Chat UI with Gradio

Launch a chat interface using Gradio:

python -m mlx_vlm.chat_ui --model mlx-community/Qwen2-VL-2B-Instruct-4bit

Python Script

Here's an example of how to use MLX-VLM in a Python script:

import mlx.core as mx
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

# Load the model
model_path = "mlx-community/Qwen2-VL-2B-Instruct-4bit"
model, processor = load(model_path)
config = load_config(model_path)

# Prepare input
image = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
prompt = "Describe this image."

# Apply chat template
formatted_prompt = apply_chat_template(
    processor, config, prompt, num_images=len(image)
)

# Generate output
output = generate(model, processor, image, formatted_prompt, verbose=False)
print(output)

Multi-Image Chat Support

MLX-VLM supports analyzing multiple images simultaneously with select models. This feature enables more complex visual reasoning tasks and comprehensive analysis across multiple images in a single conversation.

Supported Models

The following models support multi-image chat:

  1. Idefics 2
  2. LLaVA (Interleave)
  3. Qwen2-VL
  4. Phi3-Vision
  5. Pixtral

Usage Examples

Python Script

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

model_path = "mlx-community/Qwen2-VL-2B-Instruct-4bit"
model, processor = load(model_path)
config = load_config(model_path)

images = ["path/to/image1.jpg", "path/to/image2.jpg"]
prompt = "Compare these two images."

formatted_prompt = apply_chat_template(
    processor, config, prompt, num_images=len(images)
)

output = generate(model, processor, images, formatted_prompt, verbose=False)
print(output)

Command Line

python -m mlx_vlm.generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --prompt "Compare these images" --image path/to/image1.jpg path/to/image2.jpg

These examples demonstrate how to use multiple images with MLX-VLM for more complex visual reasoning tasks.

Fine-tuning

MLX-VLM supports fine-tuning models with LoRA and QLoRA.

LoRA & QLoRA

To learn more about LoRA, please refer to the LoRA.md file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_vlm-0.1.0.tar.gz (81.8 kB view details)

Uploaded Source

Built Distribution

mlx_vlm-0.1.0-py3-none-any.whl (111.9 kB view details)

Uploaded Python 3

File details

Details for the file mlx_vlm-0.1.0.tar.gz.

File metadata

  • Download URL: mlx_vlm-0.1.0.tar.gz
  • Upload date:
  • Size: 81.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for mlx_vlm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ce53cc24faf23ea4052f8c1bac0a782d5483028d23768516779e1be0024637c6
MD5 1b8409b57c2248b1315023ca7be271a8
BLAKE2b-256 759f5950a46a52b097ea92c6ab237512849ec85b4d839af7605737237f332cba

See more details on using hashes here.

File details

Details for the file mlx_vlm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mlx_vlm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 111.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for mlx_vlm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 19b863a12648a109d3ab78dab9cf0adb7ceb4ecfb047dcbb5dbd947a199bf892
MD5 2a1a262dd23e11368ce056cf4e3e99a2
BLAKE2b-256 313d480caabed732ff97be92566288216a6468f43c6dad773528449a1aa7ab10

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page