Skip to main content

Package for an easy implementation of paper "Attention Prompting on Image for Large Vision-Language Models".

Project description

apiprompting: Attention Prompting on Image for Large Vision-Language Models


version license python-version Gradio

👋 hello

Package for an easy implementation of Attention Prompting on Image for Large Vision-Language Models.

💻 install

pip install apiprompting

📄 Quick Start

clip_api

Generates image masks and blends them using CLIP_Based API.

Parameters

  • images (list): list of images. Each item can be a path to image (str) or a PIL.Image.

  • queries (list): list of queries. Each item is a str.

  • batch_size (int): Batch size for processing images. Default is 8.

  • model_name (str):
    Name of the model to load the pretrained model. Available options include "ViT-L-14-336", "ViT-L-14", and "ViT-B-32".

  • layer_index (int, optional, default=22):
    Index of the layer in the model to hook. This is where the feature extraction occurs.

  • enhance_coe (int, optional, default=10):
    Enhancement coefficient for mask blending, which determines the strength of the enhancement applied to the generated masks.

  • kernel_size (int, optional, default=3):
    Kernel size for mask blending, which should be an odd number. This determines the size of the convolution kernel used in blending.

  • interpolate_method_name (str, optional, default="LANCZOS"):
    Name of the interpolation method used for image resizing. It can be any interpolation method supported by PIL.Image.resize, such as "NEAREST", "BILINEAR", "BICUBIC", "LANCZOS", etc.

  • grayscale (float, optional, default=0):
    A flag indicating whether to convert the image to grayscale. A value of 0 means no grayscale conversion, while a value of 1 will convert the image to grayscale.

Returns

  • list:
    A list containing the masked images generated by the function. Each item is a PIL.Image.

llava_api

Generates image masks and blends them using the LLaVA_Based API.

Parameters

  • images (list): list of images. Each item can be a path to image (str) or a PIL.Image.

  • queries (list): list of queries. Each item is a str.

  • batch_size (int): Batch size for processing images. Only support 1.

  • model_name (str):
    Name of the model to load the pretrained model. One of "llava-v1.5-7b" and "llava-v1.5-13b".

  • layer_index (int, optional, default=20):
    Index of the layer in the model to hook. This is where the feature extraction occurs.

  • enhance_coe (int, optional, default=10):
    Enhancement coefficient for mask blending, which determines the strength of the enhancement applied to the generated masks.

  • kernel_size (int, optional, default=3):
    Kernel size for mask blending, which should be an odd number. This determines the size of the convolution kernel used in blending.

  • interpolate_method_name (str, optional, default="LANCZOS"):
    Name of the interpolation method used for image resizing. It can be any interpolation method supported by PIL.Image.resize, such as "NEAREST", "BILINEAR", "BICUBIC", "LANCZOS", etc.

  • grayscale (float, optional, default=0):
    A flag indicating whether to convert the image to grayscale. A value of 0 means no grayscale conversion, while a value of 1 will convert the image to grayscale.

Returns

  • list:
    A list containing the masked images generated by the function. Each item is a PIL.Image.

Example

from apiprompting import clip_api, llava_api

images, queries = ["path/to/image"], ["query"]

# CLIP_Based API
masked_images = clip_api(images, queries, model_name="ViT-L-14-336")
# LLaVA_Based API
masked_images = llava_api(images, queries, model_name="llava-v1.5-13b")

💜 acknowledgement

The README file is adopted from here.

🦸 contribution

We would love your help in making this repository even better! If you noticed any bug, or if you have any suggestions for improvement, feel free to open an issue or submit a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

apiprompting-0.1.0rc1.tar.gz (1.6 MB view details)

Uploaded Source

Built Distribution

apiprompting-0.1.0rc1-py3-none-any.whl (1.4 MB view details)

Uploaded Python 3

File details

Details for the file apiprompting-0.1.0rc1.tar.gz.

File metadata

  • Download URL: apiprompting-0.1.0rc1.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.14 Linux/5.15.0-113-generic

File hashes

Hashes for apiprompting-0.1.0rc1.tar.gz
Algorithm Hash digest
SHA256 d3b425382eae0b3ffcd8b5d6777020e89ae5b611aa9a1346e71c3daef4b2e15e
MD5 5d82a878f5096c8bcfaa6225e94aef35
BLAKE2b-256 5d3af05b80ddacdae0390ec6eea1ee582d47448865043c7de4d3b0dfaf30d531

See more details on using hashes here.

File details

Details for the file apiprompting-0.1.0rc1-py3-none-any.whl.

File metadata

  • Download URL: apiprompting-0.1.0rc1-py3-none-any.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.14 Linux/5.15.0-113-generic

File hashes

Hashes for apiprompting-0.1.0rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 df46826d64c89086663a4c23641391ff7aca10c5893a7c3a6f20013c4f92220c
MD5 c59f1ec5cd30c4e8c92c8f770f14d6cd
BLAKE2b-256 33b9e292bb821aa5f904f2880212bac22c0d29bb48ac0f1d9c0e9773079c921b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page