Package for an easy implementation of paper "Attention Prompting on Image for Large Vision-Language Models".
Project description
👋 hello
Package for an easy implementation of Attention Prompting on Image for Large Vision-Language Models.
💻 install
pip install apiprompting
📄 Quick Start
clip_api
Generates image masks and blends them using CLIP_Based API.
Parameters
-
images (
list
): list of images. Each item can be a path to image (str
) or aPIL.Image
. -
queries (
list
): list of queries. Each item is astr
. -
batch_size (
int
): Batch size for processing images. Default is 8. -
model_name (
str
):
Name of the model to load the pretrained model. Available options include"ViT-L-14-336"
,"ViT-L-14"
, and"ViT-B-32"
. -
layer_index (
int
, optional, default=22):
Index of the layer in the model to hook. This is where the feature extraction occurs. -
enhance_coe (
int
, optional, default=10):
Enhancement coefficient for mask blending, which determines the strength of the enhancement applied to the generated masks. -
kernel_size (
int
, optional, default=3):
Kernel size for mask blending, which should be an odd number. This determines the size of the convolution kernel used in blending. -
interpolate_method_name (
str
, optional, default="LANCZOS"):
Name of the interpolation method used for image resizing. It can be any interpolation method supported byPIL.Image.resize
, such as"NEAREST"
,"BILINEAR"
,"BICUBIC"
,"LANCZOS"
, etc. -
grayscale (
float
, optional, default=0):
A flag indicating whether to convert the image to grayscale. A value of0
means no grayscale conversion, while a value of1
will convert the image to grayscale.
Returns
- list:
A list containing the masked images generated by the function. Each item is a PIL.Image.
llava_api
Generates image masks and blends them using the LLaVA_Based API.
Parameters
-
images (
list
): list of images. Each item can be a path to image (str
) or aPIL.Image
. -
queries (
list
): list of queries. Each item is astr
. -
batch_size (
int
): Batch size for processing images. Only support 1. -
model_name (
str
):
Name of the model to load the pretrained model. One of "llava-v1.5-7b" and "llava-v1.5-13b". -
layer_index (
int
, optional, default=20):
Index of the layer in the model to hook. This is where the feature extraction occurs. -
enhance_coe (
int
, optional, default=10):
Enhancement coefficient for mask blending, which determines the strength of the enhancement applied to the generated masks. -
kernel_size (
int
, optional, default=3):
Kernel size for mask blending, which should be an odd number. This determines the size of the convolution kernel used in blending. -
interpolate_method_name (
str
, optional, default="LANCZOS"):
Name of the interpolation method used for image resizing. It can be any interpolation method supported byPIL.Image.resize
, such as"NEAREST"
,"BILINEAR"
,"BICUBIC"
,"LANCZOS"
, etc. -
grayscale (
float
, optional, default=0):
A flag indicating whether to convert the image to grayscale. A value of0
means no grayscale conversion, while a value of1
will convert the image to grayscale.
Returns
- list:
A list containing the masked images generated by the function. Each item is a PIL.Image.
Example
from apiprompting import clip_api, llava_api
images, queries = ["path/to/image"], ["query"]
# CLIP_Based API
masked_images = clip_api(images, queries, model_name="ViT-L-14-336")
# LLaVA_Based API
masked_images = llava_api(images, queries, model_name="llava-v1.5-13b")
💜 acknowledgement
The README file is adopted from here.
🦸 contribution
We would love your help in making this repository even better! If you noticed any bug, or if you have any suggestions for improvement, feel free to open an issue or submit a pull request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file apiprompting-0.1.0rc2.tar.gz
.
File metadata
- Download URL: apiprompting-0.1.0rc2.tar.gz
- Upload date:
- Size: 1.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.10.14 Linux/5.15.0-113-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3559944ad1785f75f6aee3abd3ae689bfc8f66b4e6f00b8f742647c1cd0500e5 |
|
MD5 | dfab9c348bd705715b3f183681cb583d |
|
BLAKE2b-256 | 9c7206a5d7423ef25b87c2009bfcf165b695c3c263cd7afd57c9e3a8a1c5eb5c |
File details
Details for the file apiprompting-0.1.0rc2-py3-none-any.whl
.
File metadata
- Download URL: apiprompting-0.1.0rc2-py3-none-any.whl
- Upload date:
- Size: 1.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.10.14 Linux/5.15.0-113-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff619990414b20ba72f25019e4b8f40f0f3d8c8cfacf9e06f68c9826f03d3761 |
|
MD5 | d01d5188749a1821f01eb0cb0fcb1eea |
|
BLAKE2b-256 | c91d2a2145d7e7c0540c06c5e779fe4ed48fc32feb2a23fe2b3194788fd65c01 |