Keye Vision Language Model Utils - PyTorch
Project description
keye-vl-utils
Keye-vl-utils contains a set of helper functions for processing and integrating visual language information with Keye Series Model.
Install
pip install keye-vl-utils
Usage
KeyeVL
from transformers import AutoModel, AutoProcessor
from keye_vl_utils import process_vision_info
# default: Load the model on the available device(s)
model_path = "Kwai-Keye/Keye-VL-8B-Preview"
model = AutoModel.from_pretrained(
model_path, torch_dtype="auto", device_map="auto", attn_implementation="flash_attention_2", trust_remote_code=True,
).to('cuda')
# You can set the maximum tokens for a video through the environment variable VIDEO_MAX_PIXELS
# based on the maximum tokens that the model can accept.
# export VIDEO_MAX_PIXELS = 32000 * 28 * 28 * 0.9
# You can directly insert a local file path, a URL, or a base64-encoded image into the position where you want in the text.
messages = [
# Image
## Local file path
[{"role": "user", "content": [{"type": "image", "image": "file:///path/to/your/image.jpg"}, {"type": "text", "text": "Describe this image."}]}],
## Image URL
[{"role": "user", "content": [{"type": "image", "image": "http://path/to/your/image.jpg"}, {"type": "text", "text": "Describe this image."}]}],
## Base64 encoded image
[{"role": "user", "content": [{"type": "image", "image": "data:image;base64,/9j/..."}, {"type": "text", "text": "Describe this image."}]}],
## PIL.Image.Image
[{"role": "user", "content": [{"type": "image", "image": pil_image}, {"type": "text", "text": "Describe this image."}]}],
## Model dynamically adjusts image size, specify dimensions if required.
[{"role": "user", "content": [{"type": "image", "image": "file:///path/to/your/image.jpg", "resized_height": 280, "resized_width": 420}, {"type": "text", "text": "Describe this image."}]}],
# Video
## Local video path
[{"role": "user", "content": [{"type": "video", "video": "file:///path/to/video1.mp4"}, {"type": "text", "text": "Describe this video."}]}],
## Local video frames
[{"role": "user", "content": [{"type": "video", "video": ["file:///path/to/extracted_frame1.jpg", "file:///path/to/extracted_frame2.jpg", "file:///path/to/extracted_frame3.jpg"],}, {"type": "text", "text": "Describe this video."},],}],
## Model dynamically adjusts video nframes, video height and width. specify args if required.
[{"role": "user", "content": [{"type": "video", "video": "file:///path/to/video1.mp4", "fps": 2.0, "resized_height": 280, "resized_width": 280}, {"type": "text", "text": "Describe this video."}]}],
]
processor = AutoProcessor.from_pretrained(model_path)
model = AutoModel.from_pretrained(model_path, torch_dtype="auto", device_map="auto", trust_remote_code=True).to('cuda')
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
images, videos, video_kwargs = process_vision_info(messages, return_video_kwargs=True)
inputs = processor(text=text, images=images, videos=videos, padding=True, return_tensors="pt", **video_kwargs).to("cuda")
print(inputs)
generated_ids = model.generate(**inputs)
print(generated_ids)
Deployment
The Keye-8B-Instruct series maintain full compatibility with the Qwen2_5_VLForConditionalGeneration architecture for deployment and inference.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file keye_vl_utils-1.0.0.tar.gz.
File metadata
- Download URL: keye_vl_utils-1.0.0.tar.gz
- Upload date:
- Size: 7.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b79e3343ad9ef0354b0f9484d27c88baae638f5fac67bd31996845a7c1d4903
|
|
| MD5 |
f65722b8dd28af8f803c8b97d3c3670b
|
|
| BLAKE2b-256 |
78f847a6bfc0728f5e0eb48a589c348f0e2e771b1be8c85f125b31aa00428f5d
|
File details
Details for the file keye_vl_utils-1.0.0-py3-none-any.whl.
File metadata
- Download URL: keye_vl_utils-1.0.0-py3-none-any.whl
- Upload date:
- Size: 7.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba5b03ac05e7991b382d48ac4f74d4b52ae82b5fd2c087175d25270b226ae56b
|
|
| MD5 |
f3904ad348aa765e258d7cf931341091
|
|
| BLAKE2b-256 |
8dc7a42ffdd760338129ef4d3419e69612de2654d0f0d028fd503bb6eafd2a4b
|