Skip to main content

Easily computing clip embeddings and building a clip retrieval system with them

Project description

clip-retrieval

pypi Open In Colab Try it on gitpod

Easily computing clip embeddings and building a clip retrieval system with them.

  • clip batch allows you to quickly (1500 sample/s on a 3080) compute image and text embeddings and indices
  • clip filter allows you to filter out the data using the clip embeddings
  • clip back hosts the indices with a simple flask service
  • clip service is a simple ui querying the back

End to end this make it possible to build a simple semantic search system. Interested to learn about semantic search in general ? You can read by medium post on the topic.

Install

pip install clip-retrieval

clip batch

Get some images in an example_folder, for example by doing:

pip install img2dataset
echo 'https://placekitten.com/200/305' >> myimglist.txt
echo 'https://placekitten.com/200/304' >> myimglist.txt
echo 'https://placekitten.com/200/303' >> myimglist.txt
img2dataset --url_list=myimglist.txt --output_folder=image_folder --thread_count=64 --image_size=256

You can also put text files with the same names as the images in that folder, to get the text embeddings.

Then run clip-retrieval batch --dataset_path image_folder --output_folder indice_folder

Output folder will contain:

  • description_list containing the list of caption line by line
  • image_list containing the file path of images line by line
  • img_emb.npy containing the image embeddings as numpy
  • text_emb.npy containing the text embeddings as numpy
  • image.index containing a brute force faiss index for images
  • text.index containing a brute force faiss index for texts

Clip filter

Once the embeddings are computed, you may want to filter out the data by a specific query. For that you can run clip-retrieval filter --query "cat" --output_folder "cat/" --indice_folder "indice_folder" It will copy the 100 best images for this query in the output folder. Using the --num_results or --threshold may be helpful to refine the filter

Clip back

Then run (output_folder is the output of clip batch)

echo '{"example_index": "output_folder"}' > indices_paths.json
clip-retrieval back --port 1234 --indices-paths indices_paths.json

At this point you have a simple flask server running on port 1234 and that can answer these queries:

  • /indices-list -> return a list of indices
  • /knn-service that takes as input:
{
    "text": "a text query",
    "image": "a base64 image",
    "modality": "image", // image or text index to use
    "num_images": 4, // number of output images
    "indice_name": "example_index"
}

and returns:

[
    {
        "image": "base 64 of an image",
        "text": "some result text"
    },
    {
        "image": "base 64 of an image",
        "text": "some result text"
    }
]

For development

Either locally, or in gitpod (do export PIP_USER=false there)

Setup a virtualenv:

python3 -m venv .env
source .env/bin/activate
pip install -U pip
pip install -e .

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clip_retrieval-1.0.1.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

clip_retrieval-1.0.1-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file clip_retrieval-1.0.1.tar.gz.

File metadata

  • Download URL: clip_retrieval-1.0.1.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.9.6

File hashes

Hashes for clip_retrieval-1.0.1.tar.gz
Algorithm Hash digest
SHA256 bc59afb51f67311a42ba5190f92a7ca1177d7fe901d0107c86b24c214046fbc6
MD5 5e639c59c2705a81deeec4b95bf3f366
BLAKE2b-256 04fe7ba7d62167e0665bff20203000ffb9f2290d922bde86b23db617bbb70935

See more details on using hashes here.

File details

Details for the file clip_retrieval-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: clip_retrieval-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.9.6

File hashes

Hashes for clip_retrieval-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4deb15f2db13ecc9442be65ed5b0d4c5af098b3069efa10f264f77f692517e3a
MD5 166f952d8420576c30fcc78b7f2721b7
BLAKE2b-256 7e2a3e4fff288520197d6c8af2d04c9b308daaff8b775e4591aa61aa1136ecb0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page