Phi-3-Vision on Apple silicon with MLX

These details have not been verified by PyPI

Project links

GitHub Statistics

Project description

Phi-3-MLX: Language and Vision Models for Apple Silicon

Phi-3-MLX is a versatile AI framework that leverages both the Phi-3-Vision multimodal model and the recently updated (July 2, 2024) Phi-3-Mini-128K language model, optimized for Apple Silicon using the MLX framework. This project provides an easy-to-use interface for a wide range of AI tasks, from advanced text generation to visual question answering and code execution.

Features

Support for the newly updated Phi-3-Mini-128K (language-only) model
Integration with Phi-3-Vision (multimodal) model
Optimized performance on Apple Silicon using MLX
Batched generation for processing multiple prompts
Flexible agent system for various AI tasks
Custom toolchains for specialized workflows
Model quantization for improved efficiency
LoRA fine-tuning capabilities
API integration for extended functionality (e.g., image generation, text-to-speech)

Quick Start

Install and launch Phi-3-MLX from command line:

pip install phi-3-vision-mlx
phi3v

To instead use the library in a Python script:

from phi_3_vision_mlx import generate

1. Core Functionalities

Visual Question Answering

generate('What is shown in this image?', 'https://collectionapi.metmuseum.org/api/collection/v1/iiif/344291/725918/main-image')

Batch Text Generation

# A list of prompts for batch generation
prompts = [
    "Explain the key concepts of quantum computing and provide a Rust code example demonstrating quantum superposition.",
    "Write a poem about the first snowfall of the year.",
    "Summarize the major events of the French Revolution.",
    "Describe a bustling alien marketplace on a distant planet with unique goods and creatures."
    "Implement a basic encryption algorithm in Python.",
]

# Generate responses using Phi-3-Vision (multimodal model)
generate(prompts, max_tokens=100)

# Generate responses using Phi-3-Mini-128K (language-only model)
generate(prompts, max_tokens=100, blind_model=True)

Model and Cache Quantization

# Model quantization
generate("Describe the water cycle.", quantize_model=True)

# Cache quantization
generate("Explain quantum computing.", quantize_cache=True)

LoRA Fine-tuning

Training a LoRA Adapter

from phi_3_vision_mlx import train_lora

train_lora(
    lora_layers=5,  # Number of layers to apply LoRA
    lora_rank=16,   # Rank of the LoRA adaptation
    epochs=10,      # Number of training epochs
    lr=1e-4,        # Learning rate
    warmup=0.5,     # Fraction of steps for learning rate warmup
    dataset_path="JosefAlbers/akemiH_MedQA_Reason"
)

Alt text

Generating Text with LoRA

generate("Describe the potential applications of CRISPR gene editing in medicine.",
    blind_model=True,
    quantize_model=True,
    use_adapter=True)

Comparing LoRA Adapters

from phi_3_vision_mlx import test_lora

# Test model without LoRA adapter
test_lora(adapter_path=None)
# Output score: 0.6 (6/10)

# Test model with the trained LoRA adapter (using default path)
test_lora(adapter_path=True)
# Output score: 0.8 (8/10)

# Test model with a specific LoRA adapter path
test_lora(adapter_path="/path/to/your/lora/adapter")

2. Agentic Interactions

Multi-turn Conversation

from phi_3_vision_mlx import Agent

# Create an instance of the Agent
agent = Agent()

# First interaction: Analyze an image
agent('Analyze this image and describe the architectural style:', 'https://images.metmuseum.org/CRDImages/rl/original/DP-19531-075.jpg')

# Second interaction: Follow-up question
agent('What historical period does this architecture likely belong to?')

# End the conversation
# This clears the agent's memory and prepares it for a new conversation
agent.end()

Alt text

Generative Feedback Loop

# Ask the agent to generate and execute code to create a plot
agent('Plot a Lissajous Curve.')

# Ask the agent to modify the generated code and create a new plot
agent('Modify the code to plot 3:4 frequency')
agent.end()

Alt text

External API Tool Use

# Request the agent to generate an image
agent('Draw "A perfectly red apple, 32k HDR, studio lighting"')
agent.end()

# Request the agent to convert text to speech
agent('Speak "People say nothing is impossible, but I do nothing every day."')
agent.end()

Alt text

3. Custom Toolchains

Example 1. In-Context Learning Agent

from phi_3_vision_mlx import _load_text

# Create a custom tool named 'add_text'
def add_text(prompt):
    prompt, path = prompt.split('@')
    return f'{_load_text(path)}\n<|end|>\n<|user|>{prompt}'

# Define the toolchain as a string
toolchain = """
    prompt = add_text(prompt)
    responses = generate(prompt, images)
    """

# Create an Agent instance with the custom toolchain
agent = Agent(toolchain, early_stop=100)

# Run the agent
agent('How to inspect API endpoints? @https://raw.githubusercontent.com/gradio-app/gradio/main/guides/08_gradio-clients-and-lite/01_getting-started-with-the-python-client.md')

Example 2. Retrieval Augmented Coding Agent

from phi_3_vision_mlx import VDB
import datasets

# Simulate user input
user_input = 'Comparison of Sortino Ratio for Bitcoin and Ethereum.'

# Create a custom RAG tool
def rag(prompt, repo_id="JosefAlbers/sharegpt_python_mlx", n_topk=1):
    ds = datasets.load_dataset(repo_id, split='train')
    vdb = VDB(ds)
    context = vdb(prompt, n_topk)[0][0]
    return f'{context}\n<|end|>\n<|user|>Plot: {prompt}'

# Define the toolchain
toolchain_plot = """
    prompt = rag(prompt)
    responses = generate(prompt, images)
    files = execute(responses, step)
    """

# Create an Agent instance with the RAG toolchain
agent = Agent(toolchain_plot, False)

# Run the agent with the user input
_, images = agent(user_input)

Example 3. Multi-Agent Interaction

# Continued from Example 2 above
agent_writer = Agent(early_stop=100)
agent_writer(f'Write a stock analysis report on: {user_input}', images)

Benchmarks

from phi_3_vision_mlx import benchmark

benchmark()

Task	Vanilla Model	Quantized Model	Quantized Cache	LoRA Adapter
Text Generation	8.46 tps	51.69 tps	6.94 tps	8.58 tps
Image Captioning	7.72 tps	33.10 tps	1.75 tps	7.11 tps
Batched Generation	103.47 tps	182.83 tps	38.72 tps	101.02 tps

(On M1 Max 64GB)

License

This project is licensed under the MIT License.

Citation

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

Release history Release notifications | RSS feed

0.1.0

Jul 16, 2024

0.0.9

Jul 13, 2024

0.0.9b0 pre-release

Jul 13, 2024

0.0.9a0 pre-release

Jul 11, 2024

0.0.8

Jul 9, 2024

0.0.8b0 pre-release

Jul 9, 2024

This version

0.0.8a0 pre-release

Jul 8, 2024

0.0.7

Jul 4, 2024

0.0.7a0 pre-release

Jul 4, 2024

0.0.6

Jun 23, 2024

0.0.6b4 pre-release

Jun 23, 2024

0.0.6b3 pre-release

Jun 23, 2024

0.0.6b2 pre-release

Jun 23, 2024

0.0.6b1 pre-release

Jun 23, 2024

0.0.6b0 pre-release

Jun 23, 2024

0.0.6a0 pre-release

Jun 22, 2024

0.0.5

Jun 16, 2024

0.0.4

Jun 16, 2024

0.0.3

Jun 16, 2024

0.0.3rc1 pre-release

Jun 16, 2024

0.0.3b0 pre-release

Jun 9, 2024

0.0.3a0 pre-release

Jun 8, 2024

0.0.2

May 31, 2024

0.0.2rc2 pre-release

Jun 3, 2024

0.0.2rc1 pre-release

Jun 3, 2024

0.0.2b0 pre-release

Jun 2, 2024

0.0.1

May 31, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phi_3_vision_mlx-0.0.8a0.tar.gz (37.0 kB view hashes)

Uploaded Jul 8, 2024 Source

Built Distribution

phi_3_vision_mlx-0.0.8a0-py3-none-any.whl (34.8 kB view hashes)

Uploaded Jul 8, 2024 Python 3

Hashes for phi_3_vision_mlx-0.0.8a0.tar.gz

Hashes for phi_3_vision_mlx-0.0.8a0.tar.gz
Algorithm	Hash digest
SHA256	`a77e5422685879d56a21cb4cb529c27f438dfe34e7152ef05392bc845eff4fa4`
MD5	`8d23b2341063579b56c7239c65853f69`
BLAKE2b-256	`b2dfcd77aa4ccf6c60dc1acb436d3f76796e160efc3298dfeddf6f6213f6c559`

Hashes for phi_3_vision_mlx-0.0.8a0-py3-none-any.whl

Hashes for phi_3_vision_mlx-0.0.8a0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c91dc5283fabceeb337cb9af8773c009c9bec13be654c52cf26c0b92e814277d`
MD5	`3cdc5ea59d01a5f552524ac835aa8a8c`
BLAKE2b-256	`d003467a7ef9b53b34d787cd67842e704e6c321bead150deb7fe0498e7363e1d`