Skip to main content

llama-index multi_modal_llms HuggingFace integration by [Cihan Yalçın](https://www.linkedin.com/in/chanyalcin/)

Project description

LlamaIndex Multi_Modal_Llms Integration: Huggingface

This project integrates Hugging Face's multimodal language models into the LlamaIndex framework, enabling advanced multimodal capabilities for various AI applications.

Features

  • Seamless integration of Hugging Face multimodal models with LlamaIndex
  • Support for multiple state-of-the-art vision-language models and their finetunes:
  • Easy-to-use interface for multimodal tasks like image captioning and visual question answering
  • Configurable model parameters for fine-tuned performance

Author of that Integration GitHub | LinkedIn | Email

Installation

pip install llama-index-multi-modal-llms-huggingface

Make sure to set your Hugging Face API token as an environment variable:

export HF_TOKEN=your_huggingface_token_here

Usage

Here's a basic example of how to use the Hugging Face multimodal integration:

from llama_index.multi_modal_llms.huggingface import HuggingFaceMultiModal
from llama_index.core.schema import ImageDocument

# Initialize the model
model = HuggingFaceMultiModal.from_model_name("Qwen/Qwen2-VL-2B-Instruct")

# Prepare your image and prompt
image_document = ImageDocument(image_path="path/to/your/image.jpg")
prompt = "Describe this image in detail."

# Generate a response
response = model.complete(prompt, image_documents=[image_document])

print(response.text)

Streaming

from llama_index.multi_modal_llms.huggingface import HuggingFaceMultiModal
from llama_index.core.schema import ImageDocument

# Initialize the model
model = HuggingFaceMultiModal.from_model_name("Qwen/Qwen2-VL-2B-Instruct")

# Prepare your image and prompt
image_document = ImageDocument(image_path="downloaded_image.jpg")
prompt = "Describe this image in detail."

import nest_asyncio
import asyncio

nest_asyncio.apply()


async def stream_output():
    for chunk in model.stream_complete(
        prompt, image_documents=[image_document]
    ):
        print(chunk.delta, end="", flush=True)
        await asyncio.sleep(0)


asyncio.run(stream_output())

You can also refer to this Colab notebook

Supported Models

  1. Qwen2 Vision
  2. Florence2
  3. Phi3.5 Vision
  4. PaliGemma
  5. Mllama

Each model has its unique capabilities and can be selected based on your specific use case.

Configuration

You can configure various parameters when initializing a model:

model = HuggingFaceMultiModal(
    model_name="Qwen/Qwen2-VL-2B-Instruct",
    device="cuda",  # or "cpu"
    torch_dtype=torch.float16,
    max_new_tokens=100,
    temperature=0.7,
)

Limitations

  • Async streaming is not supported for any of the models.
  • Some models have specific requirements or limitations. Please refer to the individual model classes for details.

Author of that Integration GitHub | LinkedIn | Email

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file llama_index_multi_modal_llms_huggingface-0.2.1.tar.gz.

File metadata

File hashes

Hashes for llama_index_multi_modal_llms_huggingface-0.2.1.tar.gz
Algorithm Hash digest
SHA256 92917e32da2ca2505c96205f1d44969cac3504e9f9ae14428d566ace33287d70
MD5 94065e50c7178d594ccc4b6d46c414e6
BLAKE2b-256 a6435ac0c72493c7ebf051b4a4bc3605ebab8f46f27ea22ac14c39b91df889c1

See more details on using hashes here.

File details

Details for the file llama_index_multi_modal_llms_huggingface-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_multi_modal_llms_huggingface-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c63595906e61991a5e44b9ed2a033cbc307cc92cce9b371e62627d18599cbfb2
MD5 258e067e5e72b7a9dcaac82fb74cb111
BLAKE2b-256 1563a864d78610fd80354c4733a5429f6515c89dc7a0a40030cd58e2831b8f9c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page