llama-index multi_modal_llms HuggingFace integration by [Cihan Yalçın](https://www.linkedin.com/in/chanyalcin/)
Project description
LlamaIndex Multi_Modal_Llms Integration: Huggingface
This project integrates Hugging Face's multimodal language models into the LlamaIndex framework, enabling advanced multimodal capabilities for various AI applications.
Features
- Seamless integration of Hugging Face multimodal models with LlamaIndex
- Support for multiple state-of-the-art vision-language models and their finetunes:
- Easy-to-use interface for multimodal tasks like image captioning and visual question answering
- Configurable model parameters for fine-tuned performance
Author of that Integration GitHub | LinkedIn | Email
Installation
pip install llama-index-multi-modal-llms-huggingface
Make sure to set your Hugging Face API token as an environment variable:
export HF_TOKEN=your_huggingface_token_here
Usage
Here's a basic example of how to use the Hugging Face multimodal integration:
from llama_index.multi_modal_llms.huggingface import HuggingFaceMultiModal
from llama_index.core.schema import ImageDocument
# Initialize the model
model = HuggingFaceMultiModal.from_model_name("Qwen/Qwen2-VL-2B-Instruct")
# Prepare your image and prompt
image_document = ImageDocument(image_path="path/to/your/image.jpg")
prompt = "Describe this image in detail."
# Generate a response
response = model.complete(prompt, image_documents=[image_document])
print(response.text)
Streaming
from llama_index.multi_modal_llms.huggingface import HuggingFaceMultiModal
from llama_index.core.schema import ImageDocument
# Initialize the model
model = HuggingFaceMultiModal.from_model_name("Qwen/Qwen2-VL-2B-Instruct")
# Prepare your image and prompt
image_document = ImageDocument(image_path="downloaded_image.jpg")
prompt = "Describe this image in detail."
import nest_asyncio
import asyncio
nest_asyncio.apply()
async def stream_output():
for chunk in model.stream_complete(
prompt, image_documents=[image_document]
):
print(chunk.delta, end="", flush=True)
await asyncio.sleep(0)
asyncio.run(stream_output())
You can also refer to this Colab notebook
Supported Models
- Qwen2 Vision
- Florence2
- Phi3.5 Vision
- PaliGemma
- Mllama
Each model has its unique capabilities and can be selected based on your specific use case.
Configuration
You can configure various parameters when initializing a model:
model = HuggingFaceMultiModal(
model_name="Qwen/Qwen2-VL-2B-Instruct",
device="cuda", # or "cpu"
torch_dtype=torch.float16,
max_new_tokens=100,
temperature=0.7,
)
Limitations
- Async streaming is not supported for any of the models.
- Some models have specific requirements or limitations. Please refer to the individual model classes for details.
Author of that Integration GitHub | LinkedIn | Email
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file llama_index_multi_modal_llms_huggingface-0.2.0.tar.gz
.
File metadata
- Download URL: llama_index_multi_modal_llms_huggingface-0.2.0.tar.gz
- Upload date:
- Size: 8.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.10 Darwin/22.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bded1c9e44f51b37fcb685bfb30a87e93fbe8870814a98e91521f4a3b7ac4bf9 |
|
MD5 | 24c912ddb5d417ba5be010279cbc6b8d |
|
BLAKE2b-256 | b886ad1def3f82b074ac75f1d9b15f4762561d76eb88e3e2ef648d3525c110c3 |
File details
Details for the file llama_index_multi_modal_llms_huggingface-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: llama_index_multi_modal_llms_huggingface-0.2.0-py3-none-any.whl
- Upload date:
- Size: 7.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.10 Darwin/22.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e0c78536ec2b71d7fa56e5d2a6f96889d55a2c0e0749a20c6cd2e1afbe766d85 |
|
MD5 | d7c475190c568dbb9709bca4b4aa6acc |
|
BLAKE2b-256 | 7602af9083675459ff2f21a7157bb8593ffe968071eefaf401efa754980b4bde |