Skip to main content

The first reverse image RAG API for image captioning and visual question answering with GPT-4V.

Project description

Reverse Image RAG - (RIR)

Synopsis:

We build an API to retrieval-augment vision-language models with visual context retrieved from the web.

Concretely, for a query image and query text (e.g. a question), we leverage reverse image search to find most similar images and their titles / captions.

The final product is a VLM-API that allows to automatically leverage reverse-image-search based retrieval augmentation.

Usage:

pip install rir_api

import rir_api 

api = rir_api.RIR_API(openai_api_key)

image_url = "https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcSgN8RDkURVE8mgOf-n02TqJdC2l1o5cVFA32NpZtuVp8MaFfZY"
query_text = "What is in this image?"
response = api.query_with_image(image_url, query_text)
# >> runs reverse image search
# >> formats visual context prompt
# >> queries VLM with full query

(see run.py for minimal example)

Debug mode:

For debugging, you can make API calls that display the web GUI (headless=True), and plot the image search result (show_result=True):

response = api.query_with_image(image_url, query_text, show_result=True, delay=3, headless=False)

Next steps

  • modularized API interface
  • information extraction from search results

Feel free to ping me under mdmoor[at]cs.stanford.edu if you're interested in contributing.

Reference:

@misc{Moor2024,
author = {Michael Moor},
title = {Reverse Image RAG~(RIR)},
year = {2024},
publisher = {GitHub},
journal = {GitHub Repository},
howpublished = {\url{https://github.com/mi92/reverse-image-rag}},
}

More teaser examples:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rir_api-0.1.2.tar.gz (2.4 kB view details)

Uploaded Source

Built Distribution

rir_api-0.1.2-py3-none-any.whl (2.0 kB view details)

Uploaded Python 3

File details

Details for the file rir_api-0.1.2.tar.gz.

File metadata

  • Download URL: rir_api-0.1.2.tar.gz
  • Upload date:
  • Size: 2.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.5

File hashes

Hashes for rir_api-0.1.2.tar.gz
Algorithm Hash digest
SHA256 8e6675d69f4dbc454abac19bc5fa01eb260993106f700581fe5b1e5b190242b6
MD5 b5722310adee7508b82721c82eee1633
BLAKE2b-256 a2288f3327e0096c15dae595ed0e2dad6f54b345faf9f7c5d5ff8676eed34017

See more details on using hashes here.

File details

Details for the file rir_api-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: rir_api-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 2.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.5

File hashes

Hashes for rir_api-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 078d07f2e54ecb77ab98b15a4b0127112d2cf7888d5af8172c57a6d2c67f3c03
MD5 c2c03baec90928f27eabe8bbddc0b9e7
BLAKE2b-256 092b2d7a427db8f641b0af939f1b602c3b69b64eaa6dcb6134cd69a3d3687935

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page