llama-index multi_modal_llms HuggingFace integration by [Cihan Yalçın](https://www.linkedin.com/in/chanyalcin/)

These details have not been verified by PyPI

Project description

LlamaIndex Multi_Modal_Llms Integration: Huggingface

This project integrates Hugging Face's multimodal language models into the LlamaIndex framework, enabling advanced multimodal capabilities for various AI applications.

Features

Seamless integration of Hugging Face multimodal models with LlamaIndex
Support for multiple state-of-the-art vision-language models and their finetunes:
Easy-to-use interface for multimodal tasks like image captioning and visual question answering
Configurable model parameters for fine-tuned performance

Author of that Integration GitHub | LinkedIn | Email

Installation

pip install llama-index-multi-modal-llms-huggingface

Make sure to set your Hugging Face API token as an environment variable:

export HF_TOKEN=your_huggingface_token_here

Usage

Here's a basic example of how to use the Hugging Face multimodal integration:

from llama_index.multi_modal_llms.huggingface import HuggingFaceMultiModal
from llama_index.core.schema import ImageDocument

# Initialize the model
model = HuggingFaceMultiModal.from_model_name("Qwen/Qwen2-VL-2B-Instruct")

# Prepare your image and prompt
image_document = ImageDocument(image_path="path/to/your/image.jpg")
prompt = "Describe this image in detail."

# Generate a response
response = model.complete(prompt, image_documents=[image_document])

print(response.text)

Streaming

from llama_index.multi_modal_llms.huggingface import HuggingFaceMultiModal
from llama_index.core.schema import ImageDocument

# Initialize the model
model = HuggingFaceMultiModal.from_model_name("Qwen/Qwen2-VL-2B-Instruct")

# Prepare your image and prompt
image_document = ImageDocument(image_path="downloaded_image.jpg")
prompt = "Describe this image in detail."

import nest_asyncio
import asyncio

nest_asyncio.apply()


async def stream_output():
    for chunk in model.stream_complete(
        prompt, image_documents=[image_document]
    ):
        print(chunk.delta, end="", flush=True)
        await asyncio.sleep(0)


asyncio.run(stream_output())

You can also refer to this Colab notebook

Supported Models

Qwen2 Vision
Florence2
Phi3.5 Vision
PaliGemma
Mllama

Each model has its unique capabilities and can be selected based on your specific use case.

Configuration

You can configure various parameters when initializing a model:

model = HuggingFaceMultiModal(
    model_name="Qwen/Qwen2-VL-2B-Instruct",
    device="cuda",  # or "cpu"
    torch_dtype=torch.float16,
    max_new_tokens=100,
    temperature=0.7,
)

Limitations

Async streaming is not supported for any of the models.
Some models have specific requirements or limitations. Please refer to the individual model classes for details.

Author of that Integration GitHub | LinkedIn | Email

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.5.1

Sep 8, 2025

This version

0.5.0

Jul 31, 2025

0.4.2

Feb 27, 2025

0.4.1

Feb 13, 2025

0.4.0

Dec 4, 2024

0.2.1

Nov 26, 2024

0.2.0

Nov 17, 2024

0.1.1

Oct 8, 2024

0.1.0

Oct 3, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_multi_modal_llms_huggingface-0.5.0.tar.gz (8.0 kB view details)

Uploaded Jul 31, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llama_index_multi_modal_llms_huggingface-0.5.0-py3-none-any.whl (8.6 kB view details)

Uploaded Jul 31, 2025 Python 3

File details

Details for the file llama_index_multi_modal_llms_huggingface-0.5.0.tar.gz.

File metadata

Download URL: llama_index_multi_modal_llms_huggingface-0.5.0.tar.gz
Upload date: Jul 31, 2025
Size: 8.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.13

File hashes

Hashes for llama_index_multi_modal_llms_huggingface-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`4e815344addbf493d31bd0d226c8c2d4d681016f25ad7c382bd7c2fb2b8bab57`
MD5	`eedbfca83a94c07001ec39ca510cc0c2`
BLAKE2b-256	`9d81cd84deb8d581c523c1b640fb1251f851f28b2547d716526267736c7fc684`

See more details on using hashes here.

File details

Details for the file llama_index_multi_modal_llms_huggingface-0.5.0-py3-none-any.whl.

File metadata

Download URL: llama_index_multi_modal_llms_huggingface-0.5.0-py3-none-any.whl
Upload date: Jul 31, 2025
Size: 8.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.13

File hashes

Hashes for llama_index_multi_modal_llms_huggingface-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2722b05758d9ab89939bcd5b2a7b5af9873a1fabe4970f77cb04476ab876945d`
MD5	`e51885f376d09b8875dfa75015fad6ef`
BLAKE2b-256	`14005338755139795df72b7669c4c913c643429f229e3aaca1124b21f0c57c88`

See more details on using hashes here.

llama-index-multi-modal-llms-huggingface 0.5.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

LlamaIndex Multi_Modal_Llms Integration: Huggingface

Features

Author of that Integration GitHub | LinkedIn | Email

Installation

Usage

Streaming

Supported Models

Configuration

Limitations

Author of that Integration GitHub | LinkedIn | Email

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes