Skip to main content

llama-index multi_modal_llms HuggingFace integration by [Cihan Yalçın](https://www.linkedin.com/in/chanyalcin/)

Project description

LlamaIndex Multi_Modal_Llms Integration: Huggingface

This project integrates Hugging Face's multimodal language models into the LlamaIndex framework, enabling advanced multimodal capabilities for various AI applications.

Features

  • Seamless integration of Hugging Face multimodal models with LlamaIndex
  • Support for multiple state-of-the-art vision-language models and their finetunes:
  • Easy-to-use interface for multimodal tasks like image captioning and visual question answering
  • Configurable model parameters for fine-tuned performance

Author of that Integration GitHub | LinkedIn | Email

Installation

pip install llama-index-multi-modal-llms-huggingface

Make sure to set your Hugging Face API token as an environment variable:

export HF_TOKEN=your_huggingface_token_here

Usage

Here's a basic example of how to use the Hugging Face multimodal integration:

from llama_index.multi_modal_llms.huggingface import HuggingFaceMultiModal
from llama_index.core.schema import ImageDocument

# Initialize the model
model = HuggingFaceMultiModal.from_model_name("Qwen/Qwen2-VL-2B-Instruct")

# Prepare your image and prompt
image_document = ImageDocument(image_path="path/to/your/image.jpg")
prompt = "Describe this image in detail."

# Generate a response
response = model.complete(prompt, image_documents=[image_document])

print(response.text)

Streaming

from llama_index.multi_modal_llms.huggingface import HuggingFaceMultiModal
from llama_index.core.schema import ImageDocument

# Initialize the model
model = HuggingFaceMultiModal.from_model_name("Qwen/Qwen2-VL-2B-Instruct")

# Prepare your image and prompt
image_document = ImageDocument(image_path="downloaded_image.jpg")
prompt = "Describe this image in detail."

import nest_asyncio
import asyncio

nest_asyncio.apply()


async def stream_output():
    for chunk in model.stream_complete(
        prompt, image_documents=[image_document]
    ):
        print(chunk.delta, end="", flush=True)
        await asyncio.sleep(0)


asyncio.run(stream_output())

You can also refer to this Colab notebook

Supported Models

  1. Qwen2 Vision
  2. Florence2
  3. Phi3.5 Vision
  4. PaliGemma
  5. Mllama

Each model has its unique capabilities and can be selected based on your specific use case.

Configuration

You can configure various parameters when initializing a model:

model = HuggingFaceMultiModal(
    model_name="Qwen/Qwen2-VL-2B-Instruct",
    device="cuda",  # or "cpu"
    torch_dtype=torch.float16,
    max_new_tokens=100,
    temperature=0.7,
)

Limitations

  • Async streaming is not supported for any of the models.
  • Some models have specific requirements or limitations. Please refer to the individual model classes for details.

Author of that Integration GitHub | LinkedIn | Email

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_index_multi_modal_llms_huggingface-0.5.0.tar.gz.

File metadata

File hashes

Hashes for llama_index_multi_modal_llms_huggingface-0.5.0.tar.gz
Algorithm Hash digest
SHA256 4e815344addbf493d31bd0d226c8c2d4d681016f25ad7c382bd7c2fb2b8bab57
MD5 eedbfca83a94c07001ec39ca510cc0c2
BLAKE2b-256 9d81cd84deb8d581c523c1b640fb1251f851f28b2547d716526267736c7fc684

See more details on using hashes here.

File details

Details for the file llama_index_multi_modal_llms_huggingface-0.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_multi_modal_llms_huggingface-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2722b05758d9ab89939bcd5b2a7b5af9873a1fabe4970f77cb04476ab876945d
MD5 e51885f376d09b8875dfa75015fad6ef
BLAKE2b-256 14005338755139795df72b7669c4c913c643429f229e3aaca1124b21f0c57c88

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page