Molmo Utils - PyTorch
Project description
molmo-utils
Molmo Utils contains a set of helper functions for processing and integrating visual inputs with Molmo, Ai2’s state-of-the-art multimodal open language models.
Installation
pip install molmo-utils # basic usage
pip install molmo-utils[torchcodec] # recommended for video inputs
Usage
Molmo2
from transformers import AutoProcessor, AutoModelForImageTextToText
from molmo_utils import process_vision_info
model_path = "allenai/Molmo2-8B"
model = AutoModelForImageTextToText.from_pretrained(
model_path,
torch_dtype="auto",
device_map="auto",
trust_remote_code=True,
)
processor = AutoProcessor.from_pretrained(
model_path,
trust_remote_code=True,
dtype="auto",
device_map="auto",
)
# You can directly use a local file path, a URL, or a base64-encoded image.
# The processed visual tokens will always be inserted at the beginning of the input sequence.
messages = [
# Image
## Local file path
[
{
"role": "user",
"content": [
{"type": "image", "image": "file:///path/to/your/image.jpg"},
{"type": "text", "text": "Describe this image."},
],
}
],
## Image URL
[
{
"role": "user",
"content": [
{"type": "image", "image": "http://path/to/your/image.jpg"},
{"type": "text", "text": "Describe this image."},
],
}
],
## Base64-encoded image
[
{
"role": "user",
"content": [
{"type": "image", "image": "data:image;base64,/9j/..."},
{"type": "text", "text": "Describe this image."},
],
}
],
## PIL.Image.Image
[
{
"role": "user",
"content": [
{"type": "image", "image": pil_image},
{"type": "text", "text": "Describe this image."},
],
}
],
# Video
## Local video path
[
{
"role": "user",
"content": [
{"type": "video", "video": "file:///path/to/video1.mp4"},
{"type": "text", "text": "Describe this video."},
],
}
],
## Local video frames (timestamps must be provided)
[
{
"role": "user",
"content": [
{
"type": "video",
"video": [
"file:///path/to/extracted_frame1.jpg",
"file:///path/to/extracted_frame2.jpg",
"file:///path/to/extracted_frame3.jpg",
],
"timestamps": [0.0, 0.5, 1.0],
},
{"type": "text", "text": "Describe this video."},
],
}
],
## The model dynamically adjusts the frame sampling mode, maximum number of frames,
## maximum sampling FPS, etc.
[
{
"role": "user",
"content": [
{
"type": "video",
"video": "file:///path/to/video1.mp4",
"frame_sampling_mode": "uniform_last_frame",
"num_frames": 384,
"max_fps": 8.0,
},
{"type": "text", "text": "Describe this video."},
],
}
],
]
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
images, videos, video_kwargs = process_vision_info(messages)
if videos is not None:
videos, video_metadatas = zip(*videos)
videos = list(videos)
video_metadatas = list(video_metadatas)
else:
video_metadatas = None
inputs = processor(
text=text,
images=images,
videos=videos,
video_metadata=video_metadatas,
return_tensors="pt",
**video_kwargs,
)
inputs = inputs.to(model.device)
generated_ids = model.generate(**inputs, max_new_tokens=2048)
generated_text = processor.post_process_image_text_to_text(
generated_ids[:, inputs["input_ids"].size(1):],
skip_special_tokens=True,
)
print(generated_text)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
molmo_utils-0.0.1.tar.gz
(18.2 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file molmo_utils-0.0.1.tar.gz.
File metadata
- Download URL: molmo_utils-0.0.1.tar.gz
- Upload date:
- Size: 18.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f87ef5856e1191b4b4dd4c7c346a221b1089c8824d35d277f9e69a584e3f01d4
|
|
| MD5 |
7bf50624fd09ac7fbd18f0bb0dbb762b
|
|
| BLAKE2b-256 |
6f315241bb2e9862e2df80234a7dc8a9038a38c9f540310c91f7c87fd444c72b
|
File details
Details for the file molmo_utils-0.0.1-py3-none-any.whl.
File metadata
- Download URL: molmo_utils-0.0.1-py3-none-any.whl
- Upload date:
- Size: 16.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f3553eac2e52c64dba434f874b32408f7ada244629aaf161e24ff33c0ce5ef5
|
|
| MD5 |
23f224c8d652f516dc338c5f67b587e4
|
|
| BLAKE2b-256 |
adeb40792e853a45c0db13bc4f1a55536aa5f35b46588d262bfa56bbb110db0b
|