llama-index multi_modal_llms HuggingFace integration by [Cihan Yalçın](https://www.linkedin.com/in/chanyalcin/)
Project description
LlamaIndex Multi_Modal_Llms Integration: Huggingface
This project integrates Hugging Face's multimodal language models into the LlamaIndex framework, enabling advanced multimodal capabilities for various AI applications.
Features
- Seamless integration of Hugging Face multimodal models with LlamaIndex
- Support for multiple state-of-the-art vision-language models and their finetunes:
- Easy-to-use interface for multimodal tasks like image captioning and visual question answering
- Configurable model parameters for fine-tuned performance
Author of that Integration GitHub | LinkedIn | Email
Installation
pip install llama-index-multi-modal-llms-huggingface
Make sure to set your Hugging Face API token as an environment variable:
export HF_TOKEN=your_huggingface_token_here
Usage
Here's a basic example of how to use the Hugging Face multimodal integration:
from llama_index.multi_modal_llms.huggingface import HuggingFaceMultiModal
from llama_index.core.schema import ImageDocument
# Initialize the model
model = HuggingFaceMultiModal.from_model_name("Qwen/Qwen2-VL-2B-Instruct")
# Prepare your image and prompt
image_document = ImageDocument(image_path="path/to/your/image.jpg")
prompt = "Describe this image in detail."
# Generate a response
response = model.complete(prompt, image_documents=[image_document])
print(response.text)
Streaming
from llama_index.multi_modal_llms.huggingface import HuggingFaceMultiModal
from llama_index.core.schema import ImageDocument
# Initialize the model
model = HuggingFaceMultiModal.from_model_name("Qwen/Qwen2-VL-2B-Instruct")
# Prepare your image and prompt
image_document = ImageDocument(image_path="downloaded_image.jpg")
prompt = "Describe this image in detail."
import nest_asyncio
import asyncio
nest_asyncio.apply()
async def stream_output():
for chunk in model.stream_complete(
prompt, image_documents=[image_document]
):
print(chunk.delta, end="", flush=True)
await asyncio.sleep(0)
asyncio.run(stream_output())
You can also refer to this Colab notebook
Supported Models
- Qwen2 Vision
- Florence2
- Phi3.5 Vision
- PaliGemma
- Mllama
Each model has its unique capabilities and can be selected based on your specific use case.
Configuration
You can configure various parameters when initializing a model:
model = HuggingFaceMultiModal(
model_name="Qwen/Qwen2-VL-2B-Instruct",
device="cuda", # or "cpu"
torch_dtype=torch.float16,
max_new_tokens=100,
temperature=0.7,
)
Limitations
- Async streaming is not supported for any of the models.
- Some models have specific requirements or limitations. Please refer to the individual model classes for details.
Author of that Integration GitHub | LinkedIn | Email
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for llama_index_multi_modal_llms_huggingface-0.1.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 150e0dcaddce9c45f134423ed3fd91ddc2cf1cb42aa8c10c8ec835b56264ca7b |
|
MD5 | 09ee84f48f558cc34c1be1d57e0a017e |
|
BLAKE2b-256 | 641162285cad4322d82138f293ea711e1269c5d05f3e99aea728c1e6b9e24b84 |
Close
Hashes for llama_index_multi_modal_llms_huggingface-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ba647bc468cbdcba9971378bf3b4d865f0c6f83d61fd009c863f3f2ca59075b7 |
|
MD5 | 6eff8a28041eb3679a308131b3bc0dd3 |
|
BLAKE2b-256 | 52675f05385bf9cc732aa099d885294623f6a663996f3dbbd54ed4cef9488635 |