Skip to main content

llama-index multi_modal_llms HuggingFace integration by [Cihan Yalçın](https://www.linkedin.com/in/chanyalcin/)

Project description

LlamaIndex Multi_Modal_Llms Integration: Huggingface

This project integrates Hugging Face's multimodal language models into the LlamaIndex framework, enabling advanced multimodal capabilities for various AI applications.

Features

  • Seamless integration of Hugging Face multimodal models with LlamaIndex
  • Support for multiple state-of-the-art vision-language models and their finetunes:
  • Easy-to-use interface for multimodal tasks like image captioning and visual question answering
  • Configurable model parameters for fine-tuned performance

Author of that Integration GitHub | LinkedIn | Email

Installation

pip install llama-index-multi-modal-llms-huggingface

Make sure to set your Hugging Face API token as an environment variable:

export HF_TOKEN=your_huggingface_token_here

Usage

Here's a basic example of how to use the Hugging Face multimodal integration:

from llama_index.multi_modal_llms.huggingface import HuggingFaceMultiModal
from llama_index.core.schema import ImageDocument

# Initialize the model
model = HuggingFaceMultiModal.from_model_name("Qwen/Qwen2-VL-2B-Instruct")

# Prepare your image and prompt
image_document = ImageDocument(image_path="path/to/your/image.jpg")
prompt = "Describe this image in detail."

# Generate a response
response = model.complete(prompt, image_documents=[image_document])

print(response.text)

Streaming

from llama_index.multi_modal_llms.huggingface import HuggingFaceMultiModal
from llama_index.core.schema import ImageDocument

# Initialize the model
model = HuggingFaceMultiModal.from_model_name("Qwen/Qwen2-VL-2B-Instruct")

# Prepare your image and prompt
image_document = ImageDocument(image_path="downloaded_image.jpg")
prompt = "Describe this image in detail."

import nest_asyncio
import asyncio

nest_asyncio.apply()


async def stream_output():
    for chunk in model.stream_complete(
        prompt, image_documents=[image_document]
    ):
        print(chunk.delta, end="", flush=True)
        await asyncio.sleep(0)


asyncio.run(stream_output())

You can also refer to this Colab notebook

Supported Models

  1. Qwen2 Vision
  2. Florence2
  3. Phi3.5 Vision
  4. PaliGemma
  5. Mllama

Each model has its unique capabilities and can be selected based on your specific use case.

Configuration

You can configure various parameters when initializing a model:

model = HuggingFaceMultiModal(
    model_name="Qwen/Qwen2-VL-2B-Instruct",
    device="cuda",  # or "cpu"
    torch_dtype=torch.float16,
    max_new_tokens=100,
    temperature=0.7,
)

Limitations

  • Async streaming is not supported for any of the models.
  • Some models have specific requirements or limitations. Please refer to the individual model classes for details.

Author of that Integration GitHub | LinkedIn | Email

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file llama_index_multi_modal_llms_huggingface-0.2.0.tar.gz.

File metadata

File hashes

Hashes for llama_index_multi_modal_llms_huggingface-0.2.0.tar.gz
Algorithm Hash digest
SHA256 bded1c9e44f51b37fcb685bfb30a87e93fbe8870814a98e91521f4a3b7ac4bf9
MD5 24c912ddb5d417ba5be010279cbc6b8d
BLAKE2b-256 b886ad1def3f82b074ac75f1d9b15f4762561d76eb88e3e2ef648d3525c110c3

See more details on using hashes here.

File details

Details for the file llama_index_multi_modal_llms_huggingface-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_multi_modal_llms_huggingface-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e0c78536ec2b71d7fa56e5d2a6f96889d55a2c0e0749a20c6cd2e1afbe766d85
MD5 d7c475190c568dbb9709bca4b4aa6acc
BLAKE2b-256 7602af9083675459ff2f21a7157bb8593ffe968071eefaf401efa754980b4bde

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page