Skip to main content

Python library for detecting image objects with natural language text labels

Project description

CLIP_BBox

CLIP_BBox is a Python library for detecting image objects with natural language text labels.

Build Status codecov GitHub GitHub issues PyPI Documentation Status

Overview / About

CLIP is a neural network, pretrained on image-text pairs, that can predict the most relevant text snippet for a given image.

Given an image and a natural language text label, CLIP_BBox will obtain the image's spatial embedding and text label's embedding from CLIP, compute the similarity heatmap between the embeddings, then draw bounding boxes around the image regions with the highest image-text correspondences.

Note

The files for building the CLIP model (clip.py, model.py, newpad.py, simple_tokenizer.py) are third-party code from the CLIP repo. They are not included in test coverage.

Features

The library provides functions for the following operations:

  • Getting and appropriately reshaping an image's spatial embedding from the CLIP model before it performs attention-pooling
  • Getting a text snippet's embedding from CLIP
  • Computing the similarity heatmap between an image's spatial and text embeddings from CLIP
  • Drawing bounding boxes on an image, given a similarity heatmap

Install

Use pip to install clip_bbox as a Python package:

$ pip install clip-bbox

Usage Examples

Command Line Script

usage: python -m clip_bbox [-h] imgpath caption outpath

positional arguments:
  imgpath     path to input image
  caption     caption of input image
  outpath     path to output image displaying bounding boxes

optional arguments:
  -h, --help  show this help message and exit

To draw bounding boxes on an image based on its caption, run

$ python -m clip_bbox "path/to/img.png" "caption of your image" "path/to/output_path.png"

Python Module

To draw bounding boxes on an image based on its caption, do the following:

from clip_bbox import run_clip_bbox

run_clip_bbox("path/to/img.png", "caption of your image", "path/to/output_path.png")

Example Output

Here is an example output image for the caption "a camera on a tripod":

example output

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clip_bbox-0.3.1.tar.gz (5.0 MB view details)

Uploaded Source

Built Distribution

clip_bbox-0.3.1-py3-none-any.whl (3.3 MB view details)

Uploaded Python 3

File details

Details for the file clip_bbox-0.3.1.tar.gz.

File metadata

  • Download URL: clip_bbox-0.3.1.tar.gz
  • Upload date:
  • Size: 5.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for clip_bbox-0.3.1.tar.gz
Algorithm Hash digest
SHA256 de30ca74e436673b4ac9bf3024ba9dbc10524704dcd8c7fc3c4592af6ff2ec39
MD5 c01aabee7a2e67639f7c74c5d652a63e
BLAKE2b-256 7f04ca3733c9159a7c6395568ffac0f5851c307eec9aa2e9a69d7cb0261ecfc6

See more details on using hashes here.

File details

Details for the file clip_bbox-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: clip_bbox-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 3.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for clip_bbox-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 dbe06e16aadc81a90ef361e95d99c736d122737ddd28cafed8e86b9626b255d8
MD5 81319f0c3d0c24318325267d91857464
BLAKE2b-256 16afa443c7e1eb4d823fcb6bd6080118d638b899658289cec43178c1b512b331

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page