Skip to main content

llama-index multi_modal_llms HuggingFace integration by [Cihan Yalçın](https://www.linkedin.com/in/chanyalcin/)

Project description

LlamaIndex Multi_Modal_Llms Integration: Huggingface

This project integrates Hugging Face's multimodal language models into the LlamaIndex framework, enabling advanced multimodal capabilities for various AI applications.

Features

  • Seamless integration of Hugging Face multimodal models with LlamaIndex
  • Support for multiple state-of-the-art vision-language models and their finetunes:
  • Easy-to-use interface for multimodal tasks like image captioning and visual question answering
  • Configurable model parameters for fine-tuned performance

Author of that Integration GitHub | LinkedIn | Email

Installation

pip install llama-index-multi-modal-llms-huggingface

Make sure to set your Hugging Face API token as an environment variable:

export HF_TOKEN=your_huggingface_token_here

Usage

Here's a basic example of how to use the Hugging Face multimodal integration:

from llama_index.multi_modal_llms.huggingface import HuggingFaceMultiModal
from llama_index.core.schema import ImageDocument

# Initialize the model
model = HuggingFaceMultiModal.from_model_name("Qwen/Qwen2-VL-2B-Instruct")

# Prepare your image and prompt
image_document = ImageDocument(image_path="path/to/your/image.jpg")
prompt = "Describe this image in detail."

# Generate a response
response = model.complete(prompt, image_documents=[image_document])

print(response.text)

Streaming

from llama_index.multi_modal_llms.huggingface import HuggingFaceMultiModal
from llama_index.core.schema import ImageDocument

# Initialize the model
model = HuggingFaceMultiModal.from_model_name("Qwen/Qwen2-VL-2B-Instruct")

# Prepare your image and prompt
image_document = ImageDocument(image_path="downloaded_image.jpg")
prompt = "Describe this image in detail."

import nest_asyncio
import asyncio

nest_asyncio.apply()


async def stream_output():
    for chunk in model.stream_complete(
        prompt, image_documents=[image_document]
    ):
        print(chunk.delta, end="", flush=True)
        await asyncio.sleep(0)


asyncio.run(stream_output())

You can also refer to this Colab notebook

Supported Models

  1. Qwen2 Vision
  2. Florence2
  3. Phi3.5 Vision
  4. PaliGemma
  5. Mllama

Each model has its unique capabilities and can be selected based on your specific use case.

Configuration

You can configure various parameters when initializing a model:

model = HuggingFaceMultiModal(
    model_name="Qwen/Qwen2-VL-2B-Instruct",
    device="cuda",  # or "cpu"
    torch_dtype=torch.float16,
    max_new_tokens=100,
    temperature=0.7,
)

Limitations

  • Async streaming is not supported for any of the models.
  • Some models have specific requirements or limitations. Please refer to the individual model classes for details.

Author of that Integration GitHub | LinkedIn | Email

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_index_multi_modal_llms_huggingface-0.4.0.tar.gz.

File metadata

File hashes

Hashes for llama_index_multi_modal_llms_huggingface-0.4.0.tar.gz
Algorithm Hash digest
SHA256 11ac08896c5d88cd36333a2c6b9d4b15132d741dc428e90b7c9366243ec9fb6d
MD5 7dbdca4b3789b9a1308538697459cf64
BLAKE2b-256 3c0f34752c6d7aa9ca4b9114db147358d81b616297a7ae859d134a775db88db8

See more details on using hashes here.

File details

Details for the file llama_index_multi_modal_llms_huggingface-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_multi_modal_llms_huggingface-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 19a8cbb2e31d1a2dc5879570598891a9c33e78149ecd32259152b78f43a296a1
MD5 c7441333a901d362ee547b8cc05960aa
BLAKE2b-256 e009e80b40b86431bc9631e81d8dffae672cf2898b1ffd429e07322763555887

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page