Skip to main content

Vision LLMs on Apple silicon with MLX and the Hugging Face Hub

Project description

MLX-VLM

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.

Table of Contents

Installation

The easiest way to get started is to install the mlx-vlm package using pip:

pip install mlx-vlm

Usage

Command Line Interface (CLI)

Generate output from a model using the CLI:

python -m mlx_vlm.generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --temp 0.0 --image http://images.cocodataset.org/val2017/000000039769.jpg

Chat UI with Gradio

Launch a chat interface using Gradio:

python -m mlx_vlm.chat_ui --model mlx-community/Qwen2-VL-2B-Instruct-4bit

Python Script

Here's an example of how to use MLX-VLM in a Python script:

import mlx.core as mx
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

# Load the model
model_path = "mlx-community/Qwen2-VL-2B-Instruct-4bit"
model, processor = load(model_path)
config = load_config(model_path)

# Prepare input
image = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
prompt = "Describe this image."

# Apply chat template
formatted_prompt = apply_chat_template(
    processor, config, prompt, num_images=len(image)
)

# Generate output
output = generate(model, processor, image, formatted_prompt, verbose=False)
print(output)

Multi-Image Chat Support

MLX-VLM supports analyzing multiple images simultaneously with select models. This feature enables more complex visual reasoning tasks and comprehensive analysis across multiple images in a single conversation.

Supported Models

The following models support multi-image chat:

  1. Idefics 2
  2. LLaVA (Interleave)
  3. Qwen2-VL
  4. Phi3-Vision
  5. Pixtral

Usage Examples

Python Script

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

model_path = "mlx-community/Qwen2-VL-2B-Instruct-4bit"
model, processor = load(model_path)
config = load_config(model_path)

images = ["path/to/image1.jpg", "path/to/image2.jpg"]
prompt = "Compare these two images."

formatted_prompt = apply_chat_template(
    processor, config, prompt, num_images=len(images)
)

output = generate(model, processor, images, formatted_prompt, verbose=False)
print(output)

Command Line

python -m mlx_vlm.generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --prompt "Compare these images" --image path/to/image1.jpg path/to/image2.jpg

These examples demonstrate how to use multiple images with MLX-VLM for more complex visual reasoning tasks.

Fine-tuning

MLX-VLM supports fine-tuning models with LoRA and QLoRA.

LoRA & QLoRA

To learn more about LoRA, please refer to the LoRA.md file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_vlm-0.1.3.tar.gz (98.1 kB view details)

Uploaded Source

Built Distribution

mlx_vlm-0.1.3-py3-none-any.whl (138.1 kB view details)

Uploaded Python 3

File details

Details for the file mlx_vlm-0.1.3.tar.gz.

File metadata

  • Download URL: mlx_vlm-0.1.3.tar.gz
  • Upload date:
  • Size: 98.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for mlx_vlm-0.1.3.tar.gz
Algorithm Hash digest
SHA256 fac75202ef36d0b81529490afe3afdbfdacbdda771acd62c4b5caeb7cf61ec61
MD5 bf69b5cfecf7c9422b511aa385efbe27
BLAKE2b-256 25bd3303031d52e3da38beb68723ff9a98655d4f1a41fc68102a5d222c5494c3

See more details on using hashes here.

File details

Details for the file mlx_vlm-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: mlx_vlm-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 138.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for mlx_vlm-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 84fc9ade6df4dc8e333364757f1cd43cff060f3dd820379487e6cff3781207e9
MD5 a54794c4f88c21dd0e33662f392f60b3
BLAKE2b-256 9007b4372348d3ad3b831e8861f8c51c698b198f2b6fb76aa0363f3997b1887b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page