Skip to main content

llama-index multi_modal_llms HuggingFace integration by [Cihan Yalçın](https://www.linkedin.com/in/chanyalcin/)

Project description

LlamaIndex Multi_Modal_Llms Integration: Huggingface

This project integrates Hugging Face's multimodal language models into the LlamaIndex framework, enabling advanced multimodal capabilities for various AI applications.

Features

  • Seamless integration of Hugging Face multimodal models with LlamaIndex
  • Support for multiple state-of-the-art vision-language models and their finetunes:
  • Easy-to-use interface for multimodal tasks like image captioning and visual question answering
  • Configurable model parameters for fine-tuned performance

Author of that Integration GitHub | LinkedIn | Email

Installation

pip install llama-index-multi-modal-llms-huggingface

Make sure to set your Hugging Face API token as an environment variable:

export HF_TOKEN=your_huggingface_token_here

Usage

Here's a basic example of how to use the Hugging Face multimodal integration:

from llama_index.multi_modal_llms.huggingface import HuggingFaceMultiModal
from llama_index.core.schema import ImageDocument

# Initialize the model
model = HuggingFaceMultiModal.from_model_name("Qwen/Qwen2-VL-2B-Instruct")

# Prepare your image and prompt
image_document = ImageDocument(image_path="path/to/your/image.jpg")
prompt = "Describe this image in detail."

# Generate a response
response = model.complete(prompt, image_documents=[image_document])

print(response.text)

Streaming

from llama_index.multi_modal_llms.huggingface import HuggingFaceMultiModal
from llama_index.core.schema import ImageDocument

# Initialize the model
model = HuggingFaceMultiModal.from_model_name("Qwen/Qwen2-VL-2B-Instruct")

# Prepare your image and prompt
image_document = ImageDocument(image_path="downloaded_image.jpg")
prompt = "Describe this image in detail."

import nest_asyncio
import asyncio

nest_asyncio.apply()


async def stream_output():
    for chunk in model.stream_complete(
        prompt, image_documents=[image_document]
    ):
        print(chunk.delta, end="", flush=True)
        await asyncio.sleep(0)


asyncio.run(stream_output())

You can also refer to this Colab notebook

Supported Models

  1. Qwen2 Vision
  2. Florence2
  3. Phi3.5 Vision
  4. PaliGemma
  5. Mllama

Each model has its unique capabilities and can be selected based on your specific use case.

Configuration

You can configure various parameters when initializing a model:

model = HuggingFaceMultiModal(
    model_name="Qwen/Qwen2-VL-2B-Instruct",
    device="cuda",  # or "cpu"
    torch_dtype=torch.float16,
    max_new_tokens=100,
    temperature=0.7,
)

Limitations

  • Async streaming is not supported for any of the models.
  • Some models have specific requirements or limitations. Please refer to the individual model classes for details.

Author of that Integration GitHub | LinkedIn | Email

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_index_multi_modal_llms_huggingface-0.4.1.tar.gz.

File metadata

File hashes

Hashes for llama_index_multi_modal_llms_huggingface-0.4.1.tar.gz
Algorithm Hash digest
SHA256 bc88a3e56789a620f1da1f17fe37623523be8a0e544ecf05cd0d34b2b875ab79
MD5 fea2535a4561317fb9924f186b0a6ffc
BLAKE2b-256 d689f269e5d0ad7df689f7c86614d25cf556db98363dc11340f5d7e11f92aeb9

See more details on using hashes here.

File details

Details for the file llama_index_multi_modal_llms_huggingface-0.4.1-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_multi_modal_llms_huggingface-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c763632445e8773c2b35113c0e60a83afa4bc3b0f2c370b00a6821f4eb6b2e3f
MD5 2967e78a14cfebcdd986359618051aa7
BLAKE2b-256 ff1146ca4c366b6bd127f15f83f0f38461603a24ff1040c2d939550b6905177c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page