Skip to main content

Package for an easy implementation of paper "Attention Prompting on Image for Large Vision-Language Models".

Project description

apiprompting: Attention Prompting on Image for Large Vision-Language Models


version license python-version Gradio

👋 hello

Package for an easy implementation of Attention Prompting on Image for Large Vision-Language Models.

💻 install

pip install apiprompting

📄 Quick Start

clip_api

Generates image masks and blends them using CLIP_Based API.

Parameters

  • images (list): list of images. Each item can be a path to image (str) or a PIL.Image.

  • queries (list): list of queries. Each item is a str.

  • batch_size (int): Batch size for processing images. Default is 8.

  • model_name (str):
    Name of the model to load the pretrained model. Available options include "ViT-L-14-336", "ViT-L-14", and "ViT-B-32".

  • layer_index (int, optional, default=22):
    Index of the layer in the model to hook. This is where the feature extraction occurs.

  • enhance_coe (int, optional, default=10):
    Enhancement coefficient for mask blending, which determines the strength of the enhancement applied to the generated masks.

  • kernel_size (int, optional, default=3):
    Kernel size for mask blending, which should be an odd number. This determines the size of the convolution kernel used in blending.

  • interpolate_method_name (str, optional, default="LANCZOS"):
    Name of the interpolation method used for image resizing. It can be any interpolation method supported by PIL.Image.resize, such as "NEAREST", "BILINEAR", "BICUBIC", "LANCZOS", etc.

  • grayscale (float, optional, default=0):
    A flag indicating whether to convert the image to grayscale. A value of 0 means no grayscale conversion, while a value of 1 will convert the image to grayscale.

Returns

  • list:
    A list containing the masked images generated by the function. Each item is a PIL.Image.

llava_api

Generates image masks and blends them using the LLaVA_Based API.

Parameters

  • images (list): list of images. Each item can be a path to image (str) or a PIL.Image.

  • queries (list): list of queries. Each item is a str.

  • batch_size (int): Batch size for processing images. Only support 1.

  • model_name (str):
    Name of the model to load the pretrained model. One of "llava-v1.5-7b" and "llava-v1.5-13b".

  • layer_index (int, optional, default=20):
    Index of the layer in the model to hook. This is where the feature extraction occurs.

  • enhance_coe (int, optional, default=10):
    Enhancement coefficient for mask blending, which determines the strength of the enhancement applied to the generated masks.

  • kernel_size (int, optional, default=3):
    Kernel size for mask blending, which should be an odd number. This determines the size of the convolution kernel used in blending.

  • interpolate_method_name (str, optional, default="LANCZOS"):
    Name of the interpolation method used for image resizing. It can be any interpolation method supported by PIL.Image.resize, such as "NEAREST", "BILINEAR", "BICUBIC", "LANCZOS", etc.

  • grayscale (float, optional, default=0):
    A flag indicating whether to convert the image to grayscale. A value of 0 means no grayscale conversion, while a value of 1 will convert the image to grayscale.

Returns

  • list:
    A list containing the masked images generated by the function. Each item is a PIL.Image.

Example

from apiprompting import clip_api, llava_api

images, queries = ["path/to/image"], ["query"]

# CLIP_Based API
masked_images = clip_api(images, queries, model_name="ViT-L-14-336")
# LLaVA_Based API
masked_images = llava_api(images, queries, model_name="llava-v1.5-13b")

💜 acknowledgement

The README file is adopted from here.

🦸 contribution

We would love your help in making this repository even better! If you noticed any bug, or if you have any suggestions for improvement, feel free to open an issue or submit a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

apiprompting-0.1.0rc2.tar.gz (1.6 MB view details)

Uploaded Source

Built Distribution

apiprompting-0.1.0rc2-py3-none-any.whl (1.4 MB view details)

Uploaded Python 3

File details

Details for the file apiprompting-0.1.0rc2.tar.gz.

File metadata

  • Download URL: apiprompting-0.1.0rc2.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.14 Linux/5.15.0-113-generic

File hashes

Hashes for apiprompting-0.1.0rc2.tar.gz
Algorithm Hash digest
SHA256 3559944ad1785f75f6aee3abd3ae689bfc8f66b4e6f00b8f742647c1cd0500e5
MD5 dfab9c348bd705715b3f183681cb583d
BLAKE2b-256 9c7206a5d7423ef25b87c2009bfcf165b695c3c263cd7afd57c9e3a8a1c5eb5c

See more details on using hashes here.

File details

Details for the file apiprompting-0.1.0rc2-py3-none-any.whl.

File metadata

  • Download URL: apiprompting-0.1.0rc2-py3-none-any.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.14 Linux/5.15.0-113-generic

File hashes

Hashes for apiprompting-0.1.0rc2-py3-none-any.whl
Algorithm Hash digest
SHA256 ff619990414b20ba72f25019e4b8f40f0f3d8c8cfacf9e06f68c9826f03d3761
MD5 d01d5188749a1821f01eb0cb0fcb1eea
BLAKE2b-256 c91d2a2145d7e7c0540c06c5e779fe4ed48fc32feb2a23fe2b3194788fd65c01

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page