Python library for detecting image objects with natural language text labels
Project description
CLIP_BBox
CLIP_BBox
is a Python library for detecting image objects with natural language text labels.
Overview
CLIP is a neural network, pretrained on image-text pairs, that can predict the most relevant text snippet for a given image.
Given an image and a natural language text label, CLIP_BBox
will obtain the image's spatial embedding and text label's embedding from CLIP, compute the similarity heatmap between the embeddings, then draw a bounding box around the image region with the highest image-text correspondence.
Features
The library will provide functions for the following operations:
- Getting and appropriately reshaping an image's spatial embedding from the CLIP model before it performs attention-pooling
- Getting a text snippet's embedding from the CLIP model
- Computing the similarity heatmap between a pair of spatial and text embeddings from CLIP
- Drawing bounding boxes on an image, given a similarity heatmap and a similarity threshold
Install
Use pip to install clip_bbox as a Python package:
$ pip install clip_bbox
Example Usage
Use As a Command Line Script
To draw bounding boxes on an image, run
$ python clip_bbox.py --img "path/to/img.png"
Use As a Python Module
To draw bounding boxes on an image, do the following:
from clip_bbox import run_clip_bbox
run_clip_bbox('path/to/img.png')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for clip_bbox-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d357695e3dde635af778cb8c0381e4ccc2ebe7d34403e2fcc854f4d67487873c |
|
MD5 | 3f0b0f9a3887d181b5103c61915adac3 |
|
BLAKE2b-256 | 64df0fb7b30a72edef3ff4865a5d532082bda864c1e988f4965d727657f7a501 |