Skip to main content

Phi-3-Vision on Apple silicon with MLX

Project description

Phi-3-Vision for Apple MLX

Phi-3-Vision for Apple MLX is a powerful and flexible AI agent framework that leverages the Phi-3-Vision model to perform a wide range of tasks, from visual question answering to code generation and execution. This project aims to provide an easy-to-use interface for interacting with the Phi-3-Vision model, while also offering advanced features like custom toolchains and model quantization.

Phi-3-Vision is a state-of-the-art vision-language model that excels in understanding and generating content based on both textual and visual inputs. By integrating this model with Apple's MLX framework, we provide a high-performance solution optimized for Apple silicon.

Quick Start

1. Install Phi-3 Vision MLX:

To install Phi-3-Vision-MLX, run the following command:

pip install phi-3-vision-mlx

2. Launch Phi-3 Vision MLX:

To launch Phi-3-Vision-MLX:

phi3v

Or in a Python script:

from phi_3_vision_mlx import Agent

agent = Agent()

Usage

Visual Question Answering (VQA)

agent('What is shown in this image?', 'https://collectionapi.metmuseum.org/api/collection/v1/iiif/344291/725918/main-image')
agent.end()

Alt text

Generative Feedback Loop

The agent can be used to generate code, execute it, and then modify it based on feedback:

agent('Plot a Lissajous Curve.')
agent('Modify the code to plot 3:4 frequency')
agent.end()

Alt text

API Tool Use

You can use the agent to create images or generate speech using API calls:

agent('Draw "A perfectly red apple, 32k HDR, studio lighting"')
agent.end()
agent('Speak "People say nothing is impossible, but I do nothing every day."')
agent.end()

Alt text

Custom Toolchain

Toolchains allow you to customize the agent's behavior for specific tasks. Here are three examples:

Example 1: In-Context Learning (ICL)

You can create a custom toolchain to add context to the prompt:

from phi_3_vision_mlx import load_text

# Create tool
def add_text(prompt):
    prompt, path = prompt.split('@')
    return f'{load_text(path)}\n<|end|>\n<|user|>{prompt}'

# Chain tools
toolchain = """
    prompt = add_text(prompt)
    responses = generate(prompt, images)
    """

# Create agent
agent = Agent(toolchain, early_stop=100)

# Run agent
agent('How to inspect API endpoints? @https://raw.githubusercontent.com/gradio-app/gradio/main/guides/08_gradio-clients-and-lite/01_getting-started-with-the-python-client.md')

This toolchain adds context to the prompt from an external source, enhancing the agent's knowledge for specific queries.

Example 2: Retrieval Augmented Generation (RAG)

You can create another custom toolchain for retrieval-augmented generation (RAG) to code:

from phi_3_vision_mlx import VDB
import datasets

# User proxy
user_input = 'Comparison of Sortino Ratio for Bitcoin and Ethereum.'

# Create tool
def rag(prompt, repo_id="JosefAlbers/sharegpt_python_mlx", n_topk=1):
    ds = datasets.load_dataset(repo_id, split='train')
    vdb = VDB(ds)
    context = vdb(prompt, n_topk)[0][0]
    return f'{context}\n<|end|>\n<|user|>Plot: {prompt}'

# Chain tools
toolchain_plot = """
    prompt = rag(prompt)
    responses = generate(prompt, images)
    files = execute(responses, step)
    """

# Create agent
agent = Agent(toolchain_plot, False)

# Run agent
_, images = agent(user_input)

Example 3: Multi-Agent Interaction

You can also have multiple agents interacting to complete a task:

agent_writer = Agent(early_stop=100)
agent_writer(f'Write a stock analysis report on: {user_input}', images)

Batch Generation

For efficient processing of multiple prompts:

from phi_3_vision_mlx import generate

generate([
    "Write an executive summary for a communications business plan",
    "Write a resume.", 
    "Write a mystery horror.",
    "Write a Neurology ICU Admission Note.",])

Model and Cache Quantization

Quantization can significantly reduce model size and improve inference speed:

generate("Write a cosmic horror.", quantize_cache=True)
generate("Write a cosmic horror.", quantize_model=True)

LoRA Training and Inference

Fine-tune the model for specific tasks:

from phi_3_vision_mlx import train_lora

train_lora(lora_layers=5, lora_rank=16, epochs=10, lr=1e-4, warmup=.5, mask_ratios=[.0], adapter_path='adapters', dataset_path = "JosefAlbers/akemiH_MedQA_Reason")

Alt text

Use the fine-tuned model:

generate("Write a cosmic horror.", adapter_path='adapters')

Benchmarks

Task Vanilla Model Quantized Model Quantized Cache LoRA
Text Generation 8.72 tps 55.97 tps 7.04 tps 8.71 tps
Image Captioning 8.04 tps 32.48 tps 1.77 tps 8.00 tps
Batched Generation 30.74 tps 106.94 tps 20.47 tps 30.72 tps

License

This project is licensed under the MIT License.

Citation

DOI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phi_3_vision_mlx-0.0.6b1.tar.gz (20.2 kB view hashes)

Uploaded Source

Built Distribution

phi_3_vision_mlx-0.0.6b1-py3-none-any.whl (18.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page