A reverse image search API for image captioning and visual question answering.
Project description
Reverse Image RAG - (RIR)
Synopsis:
We build an API to retrieval-augment vision-language models with visual context retrieved from the web.
Concretely, for a query image and query text (e.g. a question), we leverage reverse image search to find most similar images and their titles / captions.
The final product is a VLM-API that allows to automatically leverage reverse-image-search based retrieval augmentation.
Usage:
api = RIR_API(openai_api_key)
image_url = "https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcSgN8RDkURVE8mgOf-n02TqJdC2l1o5cVFA32NpZtuVp8MaFfZY"
query_text = "What is in this image?"
response = api.query_with_image(image_url, query_text)
# >> runs reverse image search
# >> formats image-text context prompt
# >> queries VLM with full query
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
rir_api-0.1.0.tar.gz
(1.9 kB
view hashes)