llama-index multi_modal nvidia integration

These details have not been verified by PyPI

Project description

LlamaIndex Multi_Modal Integration: Nvidia

This project integrates Nvidia vlm into the LlamaIndex framework, enabling advanced multimodal capabilities for various AI applications.

Features

Seamless integration of NVIDIA vlm with LlamaIndex
Support for multiple state-of-the-art vision-language models:
Easy-to-use interface for multimodal tasks like image captioning and visual question answering
Configurable model parameters for fine-tuned performance

Installation

pip install llama-index-multi-modal-llms-nvidia

Make sure to set your NVIDIA API key as an environment variable:

export NVIDIA_API_KEY=your_api_key_here

Usage

Here's a basic example of how to use the Nvidia vlm integration:

from llama_index.multi_modal_llms.nvidia import NVIDIAMultiModal
from llama_index.core.schema import ImageDocument

# Initialize the model
model = NVIDIAMultiModal()

# Prepare your image and prompt
image_document = ImageDocument(image_path="path/to/your/image.jpg")
prompt = "Describe this image in detail."

# Generate a response
response = model.complete(prompt, image_documents=[image_document])

print(response.text)

Streaming

from llama_index.multi_modal_llms.nvidia import NVIDIAMultiModal
from llama_index.core.schema import ImageDocument

# Initialize the model
model = NVIDIAMultiModal()

# Prepare your image and prompt
image_document = ImageDocument(image_path="downloaded_image.jpg")
prompt = "Describe this image in detail."

import nest_asyncio
import asyncio

nest_asyncio.apply()

response = model.stream_complete(
    prompt=f"Describe the image",
    image_documents=[
        ImageDocument(metadata={"asset_id": asset_id}, mimetype="png")
    ],
)

for r in response:
    print(r.text, end="")

Passing an image as an NVCF asset

If your image is sufficiently large or you will pass it multiple times in a chat conversation, you may upload it once and reference it in your chat conversation

See https://docs.nvidia.com/cloud-functions/user-guide/latest/cloud-function/assets.html for details about how upload the image.

import requests

content_type = "image/jpg"
description = "example-image-from-lc-nv-ai-e-notebook"

create_response = requests.post(
    "https://api.nvcf.nvidia.com/v2/nvcf/assets",
    headers={
        "Authorization": f"Bearer {os.environ['NVIDIA_API_KEY']}",
        "accept": "application/json",
        "Content-Type": "application/json",
    },
    json={"contentType": content_type, "description": description},
)
create_response.raise_for_status()

upload_response = requests.put(
    create_response.json()["uploadUrl"],
    headers={
        "Content-Type": content_type,
        "x-amz-meta-nvcf-asset-description": description,
    },
    data=img_response.content,
)
upload_response.raise_for_status()

asset_id = create_response.json()["assetId"]

response = llm.complete(
    prompt=f"Describe the image",
    image_documents=[
        ImageDocument(metadata={"asset_id": asset_id}, mimetype="png")
    ],
)

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.5.2

Sep 8, 2025

0.5.1

Aug 13, 2025

0.5.0

Jul 30, 2025

0.4.0

Jul 7, 2025

This version

0.3.0

Nov 17, 2024

0.2.0

Nov 15, 2024

0.1.0

Nov 14, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_multi_modal_llms_nvidia-0.3.0.tar.gz (10.2 kB view details)

Uploaded Nov 17, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llama_index_multi_modal_llms_nvidia-0.3.0-py3-none-any.whl (10.2 kB view details)

Uploaded Nov 17, 2024 Python 3

File details

Details for the file llama_index_multi_modal_llms_nvidia-0.3.0.tar.gz.

File metadata

Download URL: llama_index_multi_modal_llms_nvidia-0.3.0.tar.gz
Upload date: Nov 17, 2024
Size: 10.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.11.10 Darwin/22.3.0

File hashes

Hashes for llama_index_multi_modal_llms_nvidia-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`930b6884674016bfe3d7c166a475bce09ab1eaf461fb6c14e4064c785c963882`
MD5	`c2ed2693d46762cd5964c460afcdf73d`
BLAKE2b-256	`b46a1efc75e78f0916ffe768c0ba86084be09ac61a9667b0d06827e46b1f934c`

See more details on using hashes here.

File details

Details for the file llama_index_multi_modal_llms_nvidia-0.3.0-py3-none-any.whl.

File metadata

Download URL: llama_index_multi_modal_llms_nvidia-0.3.0-py3-none-any.whl
Upload date: Nov 17, 2024
Size: 10.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.11.10 Darwin/22.3.0

File hashes

Hashes for llama_index_multi_modal_llms_nvidia-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ee6de1a8d4ee1ff921fbbe06b79cf01d789c45a239560b97ddec436380c6a8ec`
MD5	`24e1d7b8f2e5e4e6935859a421c0b253`
BLAKE2b-256	`50197145f2d39818246bf15ffb73e5e053b293a335a002108d647993dc0ea004`

See more details on using hashes here.

llama-index-multi-modal-llms-nvidia 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

LlamaIndex Multi_Modal Integration: Nvidia

Features

Installation

Usage

Streaming

Passing an image as an NVCF asset

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes