Keye Vision Language Model Utils - PyTorch
Project description
keye-vl-utils
Keye-VL Utils contains a set of helper functions for processing and integrating visual language information with KeyeVL1.5.
Install
pip install keye-vl-utils==1.5.0
Usage
KeyeVL
from transformers import AutoModel, AutoProcessor
from keye_vl_utils import process_vision_info
# You can directly insert a local file path, a URL, or a base64-encoded image into the position where you want in the text.
messages = [
# Image
## Local file path
[{"role": "user", "content": [{"type": "image", "image": "file:///path/to/your/image.jpg"}, {"type": "text", "text": "Describe this image."}]}],
## Image URL
[{"role": "user", "content": [{"type": "image", "image": "http://path/to/your/image.jpg"}, {"type": "text", "text": "Describe this image."}]}],
## Base64 encoded image
[{"role": "user", "content": [{"type": "image", "image": "data:image;base64,/9j/..."}, {"type": "text", "text": "Describe this image."}]}],
## PIL.Image.Image
[{"role": "user", "content": [{"type": "image", "image": pil_image}, {"type": "text", "text": "Describe this image."}]}],
# Video
## Local video path
[{"role": "user", "content": [{"type": "video", "video": "file:///path/to/video1.mp4"}, {"type": "text", "text": "Describe this video."}]}],
## Local video frames
[{"role": "user", "content": [{"type": "video", "video": ["file:///path/to/extracted_frame1.jpg", "file:///path/to/extracted_frame2.jpg", "file:///path/to/extracted_frame3.jpg"],}, {"type": "text", "text": "Describe this video."},],}],
## Model dynamically adjusts video nframes, video height and width. specify args if required.
[{"role": "user", "content": [{"type": "video", "video": "file:///path/to/video1.mp4"}, {"type": "text", "text": "Describe this video."}]}],
]
processor = AutoProcessor.from_pretrained("Kwai-Keye/Keye-VL-1.5-8B", trust_remote_code=True)
model = AutoModel.from_pretrained("Kwai-Keye/Keye-VL-1.5-8B", torch_dtype="auto", device_map="auto", trust_remote_code=True)
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
images, videos, **processor_args = process_vision_info(messages)
inputs = processor(text=text, images=images, videos=videos, return_tensors="pt", **processor_args)
print(inputs)
generated_ids = model.generate(**inputs)
print(generated_ids)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
keye_vl_utils-1.5.0.tar.gz
(7.7 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file keye_vl_utils-1.5.0.tar.gz.
File metadata
- Download URL: keye_vl_utils-1.5.0.tar.gz
- Upload date:
- Size: 7.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b2ee8edca5902e0009e46b358894c839b5075a1f70b71a337f64eb83f45c805f
|
|
| MD5 |
554d90fa677e452dc8d988a0bf2d8ef1
|
|
| BLAKE2b-256 |
5834aa6c222b2e95e97afe580dd49281d28d18aaa8ee947b583af74de28f17a0
|
File details
Details for the file keye_vl_utils-1.5.0-py3-none-any.whl.
File metadata
- Download URL: keye_vl_utils-1.5.0-py3-none-any.whl
- Upload date:
- Size: 7.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb334f389c8c7fd2809fe957163cb09ff5364332af2c1aa7cee1eb049ea62d29
|
|
| MD5 |
91d0209f2a7f5c783498410f1ee8dcce
|
|
| BLAKE2b-256 |
2fd27f7123ca2a000dec90157c1564e8276d697b545cd66bd3d0952233def6e5
|