llama-index multi_modal_llms HuggingFace integration by [Cihan Yalçın](https://www.linkedin.com/in/chanyalcin/)
Project description
LlamaIndex Multi_Modal_Llms Integration: Huggingface
This project integrates Hugging Face's multimodal language models into the LlamaIndex framework, enabling advanced multimodal capabilities for various AI applications.
Features
- Seamless integration of Hugging Face multimodal models with LlamaIndex
- Support for multiple state-of-the-art vision-language models and their finetunes:
- Easy-to-use interface for multimodal tasks like image captioning and visual question answering
- Configurable model parameters for fine-tuned performance
Author of that Integration GitHub | LinkedIn | Email
Installation
pip install llama-index-multi-modal-llms-huggingface
Make sure to set your Hugging Face API token as an environment variable:
export HF_TOKEN=your_huggingface_token_here
Usage
Here's a basic example of how to use the Hugging Face multimodal integration:
from llama_index.multi_modal_llms.huggingface import HuggingFaceMultiModal
from llama_index.schema import ImageDocument
# Initialize the model
model = HuggingFaceMultiModal.from_model_name("Qwen/Qwen2-VL-2B-Instruct")
# Prepare your image and prompt
image_document = ImageDocument(image_path="path/to/your/image.jpg")
prompt = "Describe this image in detail."
# Generate a response
response = model.complete(prompt, image_documents=[image_document])
print(response.text)
You can also refer to this Colab notebook
Supported Models
- Qwen2VisionMultiModal
- Florence2MultiModal
- Phi35VisionMultiModal
- PaliGemmaMultiModal
Each model has its unique capabilities and can be selected based on your specific use case.
Configuration
You can configure various parameters when initializing a model:
model = HuggingFaceMultiModal(
model_name="Qwen/Qwen2-VL-2B-Instruct",
device="cuda", # or "cpu"
torch_dtype=torch.float16,
max_new_tokens=100,
temperature=0.7,
)
Limitations
- Async streaming is not supported for any of the models.
- Some models have specific requirements or limitations. Please refer to the individual model classes for details.
Author of that Integration GitHub | LinkedIn | Email
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for llama_index_multi_modal_llms_huggingface-0.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | b23c14a7f2972b9823abfa23096fde819f7f235b6b0ea2b73695e279847646e8 |
|
MD5 | bc1bb44af37271832eb31fddb1613cfa |
|
BLAKE2b-256 | 25f418bc5befa7504d11b57be95e83f152199b499ab8435342425074e1bb7473 |
Close
Hashes for llama_index_multi_modal_llms_huggingface-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0adb6c8440f6f1533db9fce546c9339d4fc70c4081e26e53248428f504399bf8 |
|
MD5 | 73678a8c15b07f3cbb7ac019e4fd2409 |
|
BLAKE2b-256 | f5d1af15baa3dcb5e6b722f64bae76dce5982c14447a3a5f2521329d46b9513e |