Keye Vision Language Model Utils - PyTorch
Project description
keye-vl-utils
Keye-VL Utils contains a set of helper functions for processing and integrating visual language information with KeyeVL1.5.
Install
pip install keye-vl-utils==1.5.2
Usage
KeyeVL
from transformers import AutoModel, AutoProcessor
from keye_vl_utils import process_vision_info
# You can directly insert a local file path, a URL, or a base64-encoded image into the position where you want in the text.
messages = [
# Image
## Local file path
[{"role": "user", "content": [{"type": "image", "image": "file:///path/to/your/image.jpg"}, {"type": "text", "text": "Describe this image."}]}],
## Image URL
[{"role": "user", "content": [{"type": "image", "image": "http://path/to/your/image.jpg"}, {"type": "text", "text": "Describe this image."}]}],
## Base64 encoded image
[{"role": "user", "content": [{"type": "image", "image": "data:image;base64,/9j/..."}, {"type": "text", "text": "Describe this image."}]}],
## PIL.Image.Image
[{"role": "user", "content": [{"type": "image", "image": pil_image}, {"type": "text", "text": "Describe this image."}]}],
# Video
## Local video path
[{"role": "user", "content": [{"type": "video", "video": "file:///path/to/video1.mp4"}, {"type": "text", "text": "Describe this video."}]}],
## Local video frames
[{"role": "user", "content": [{"type": "video", "video": ["file:///path/to/extracted_frame1.jpg", "file:///path/to/extracted_frame2.jpg", "file:///path/to/extracted_frame3.jpg"],}, {"type": "text", "text": "Describe this video."},],}],
## Model dynamically adjusts video nframes, video height and width. specify args if required.
[{"role": "user", "content": [{"type": "video", "video": "file:///path/to/video1.mp4"}, {"type": "text", "text": "Describe this video."}]}],
]
processor = AutoProcessor.from_pretrained("Kwai-Keye/Keye-VL-1.5-8B", trust_remote_code=True)
model = AutoModel.from_pretrained("Kwai-Keye/Keye-VL-1.5-8B", torch_dtype="auto", device_map="auto", trust_remote_code=True)
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
images, videos, **processor_args = process_vision_info(messages)
inputs = processor(text=text, images=images, videos=videos, return_tensors="pt", **processor_args)
print(inputs)
generated_ids = model.generate(**inputs)
print(generated_ids)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
keye_vl_utils-1.5.2.tar.gz
(6.7 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file keye_vl_utils-1.5.2.tar.gz.
File metadata
- Download URL: keye_vl_utils-1.5.2.tar.gz
- Upload date:
- Size: 6.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f7b6b1718d22cb4bfa936604dc569460ee6a6b9fd8f2ca56c5f6e5f901a02a7
|
|
| MD5 |
281162b5690c10f461f13a7037e54d33
|
|
| BLAKE2b-256 |
a7f8bbb8268ae740f70365a9e488048a0f57f58f2c1dae65498b8764f0d529e4
|
File details
Details for the file keye_vl_utils-1.5.2-py3-none-any.whl.
File metadata
- Download URL: keye_vl_utils-1.5.2-py3-none-any.whl
- Upload date:
- Size: 7.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70b4df74afa13cf2ead113ff2ba4b617a94883dc89d9e0a5d033a7f6c4a0245c
|
|
| MD5 |
9c7d70daf9d5e10abe5588ce967b3ef0
|
|
| BLAKE2b-256 |
d1fe89fa9b7ae86254f673f7de959cb240e5c4626562336c6bf891b37d1b1b80
|