Python library for detecting image objects with natural language text labels
Project description
CLIP_BBox
CLIP_BBox
is a Python library for detecting image objects with natural language text labels.
Overview
CLIP is a neural network, pretrained on image-text pairs, that can predict the most relevant text snippet for a given image.
Given an image and a natural language text label, CLIP_BBox
will obtain the image's spatial embedding and text label's embedding from CLIP, compute the similarity heatmap between the embeddings, then draw a bounding box around the image region with the highest image-text correspondence.
Note
The files for building the CLIP model (clip.py
, model.py
, newpad.py
, simple_tokenizer.py
) are third-party code from the CLIP repo. They are not included in test coverage.
Features
The library provides functions for the following operations:
- Getting and appropriately reshaping an image's spatial embedding from the CLIP model before it performs attention-pooling
- Getting a text snippet's embedding from CLIP
- Computing the similarity heatmap between an image's spatial and text embeddings from CLIP
- Drawing bounding boxes on an image, given a similarity heatmap
Install
Use pip to install clip_bbox as a Python package:
$ pip install clip-bbox
Usage Examples
Command Line Script
usage: python -m clip_bbox [-h] imgpath caption outpath
positional arguments:
imgpath path to input image
caption caption of input image
outpath path to output image displaying bounding boxes
optional arguments:
-h, --help show this help message and exit
To draw bounding boxes on an image based on its caption, run
$ python -m clip_bbox "path/to/img.png" "caption of your image" "path/to/output_path.png"
Python Module
To draw bounding boxes on an image based on its caption, do the following:
from clip_bbox import run_clip_bbox
run_clip_bbox("path/to/img.png", "caption of your image", "path/to/output_path.png")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for clip_bbox-0.3.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c7ca2bae8895c1e52ee9e5cbe8c2079b3f26e0091131640ffab34980a19fa5e6 |
|
MD5 | e5dbd85654f4269ed4ecb907736a116c |
|
BLAKE2b-256 | cf5cbc4ad42a79155f0aa9d9f4adbd17da6066271525779beedcae6ea95f76f6 |