Skip to main content

No project description provided

Project description

Phi-3-Vision VLM Model for Apple MLX: An All-in-One Port

This project brings the powerful phi-3-vision VLM to Apple's MLX framework, offering a comprehensive solution for various text and image processing tasks. With a focus on simplicity and efficiency, this implementation offers a straightforward and minimalistic integration of the VLM model. It seamlessly incorporates essential functionalities such as generating quantized model weights, optimizing KV cache quantization during inference, facilitating LoRA/QLoRA training, and conducting model benchmarking, all encapsulated within a single file for convenient access and usage.

Key Features

  • Su-scaled RoPE: Implements Su-scaled Rotary Position Embeddings to manage sequences of up to 128K tokens.
  • Model Quantization: Reduce model size for faster loading and deployment (2.3GB quantized vs 8.5GB original).
  • KV Cache Quantization: Optimize inference for processing long contexts with minimal overhead (5.3s quantized vs 5.1s original).
  • LoRA Training: Easily customize the model for specific tasks or datasets using LoRA.
  • Benchmarking: Quickly assess model performance on any dataset (WIP).

Usage

prompt = "<|user|>\n<|image_1|>\nWhat is shown in this image?<|end|>\n<|assistant|>\n"
images = [Image.open(requests.get("https://assets-c4akfrf5b4d3f4b7.z01.azurefd.net/assets/2024/04/BMDataViz_661fb89f3845e.png" , stream=True).raw)]

Image Captioning

model, processor = load()
generate(model, processor, prompt, images)
The image displays a bar chart showing the percentage of
4.43s user 3.17s system 71% cpu 10.711 total

Cache Quantization

model, processor = load(use_quantized_cache=True)
print(generate(model, processor,  "<|user|>Write an exciting sci-fi.<|end|>\n<|assistant|>\n"))
Title: The Last Frontier\n\nIn the
2.49s user 4.52s system 131% cpu 5.325 total

Model Quantization

quantize(from_path='phi3v', to_path='quantized_phi3v', q_group_size=64, q_bits=4)
4.30s user 3.31s system 119% cpu 6.368 total
model, processor = load(model_path='quantized_phi3v')
print(generate(model, processor, "<|user|>Write an exciting sci-fi.<|end|>\n<|assistant|>\n"))
Title: The Quantum Leap\n\nIn
3.78s user 0.87s system 205% cpu 2.264 total

LoRA Training

train_lora()
22.50s user 27.58s system 22% cpu 3:41.58 total

Alt text

Benchmarking (WIP)

recall()
10.65s user 10.98s system 37% cpu 57.669 total

Installation

git clone https://github.com/JosefAlbers/Phi-3-Vision-MLX.git

Benchmarks

Task Vanilla Model Quantized Model Quantized KV Cache LoRA Adapter
Image Captioning 10.71s 8.51s 12.79s 11.70s
Text Generation 5.07s 2.24s 5.27s 5.10s

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phi_3_vision_mlx-0.0.2.tar.gz (11.8 kB view hashes)

Uploaded Source

Built Distribution

phi_3_vision_mlx-0.0.2-py3-none-any.whl (11.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page