Easily computing clip embeddings and building a clip retrieval system with them
Project description
clip-retrieval
Easily computing clip embeddings and building a clip retrieval system with them.
- clip inference allows you to quickly (1500 sample/s on a 3080) compute image and text embeddings
- clip index builds efficient indices out of the embeddings
- clip filter allows you to filter out the data using the clip index
- clip back hosts the indices with a simple flask service
- clip service is a simple ui querying the back
End to end this make it possible to build a simple semantic search system. Interested to learn about semantic search in general ? You can read by medium post on the topic.
Install
pip install clip-retrieval
clip inference
Get some images in an example_folder
, for example by doing:
pip install img2dataset
echo 'https://placekitten.com/200/305' >> myimglist.txt
echo 'https://placekitten.com/200/304' >> myimglist.txt
echo 'https://placekitten.com/200/303' >> myimglist.txt
img2dataset --url_list=myimglist.txt --output_folder=image_folder --thread_count=64 --image_size=256
You can also put text files with the same names as the images in that folder, to get the text embeddings.
Then run clip-retrieval inference --input_dataset image_folder --output_folder embeddings_folder
Output folder will contain:
- img_emb/
- img_emb_0.npy containing the image embeddings as numpy
- text_emb/
- text_emb_0.npy containing the text embeddings as numpy
- metadata/
- metadata_0.parquet containing the image paths, captions and metadata
This scales to million of samples. At 1400 sample/s of a 3080, 10M samples can be processed in 2h.
API
clip_inference turn a set of text+image into clip embeddings
- input_dataset Path to input dataset. Folder if input_format is files. Bash brace pattern such as "{000..150}.tar" (see https://pypi.org/project/braceexpand/) if webdataset (required)
- output_folder Folder where the clip embeddings will be saved, as well as metadata (required)
- input_format files or webdataset (default files)
- cache_path cache path for webdataset (default None)
- batch_size Number of items to do the inference on at once (default 256)
- num_prepro_workers Number of processes to do the preprocessing (default 8)
- enable_text Enable text processing (default True)
- enable_image Enable image processing (default True)
- enable_metadata Enable metadata processing (default False)
- write_batch_size Write batch size (default 10**6)
- subset_size Only process a subset of this size (default None)
Clip index
Clip index takes as input the output of clip inference and makes an index out of it using autofaiss
clip-retrieval index --input_folder embeddings_folder --output_folder index_folder
The output is a folder containing:
- image.index containing a brute force faiss index for images
- text.index containing a brute force faiss index for texts
- metadata.arrow containing the metadata in a format that is easy to memory map
Thanks to autofaiss and faiss, this scales to hundred of million of samples in a few hours.
Clip filter
Once the embeddings are computed, you may want to filter out the data by a specific query.
For that you can run clip-retrieval filter --query "cat" --output_folder "cat/" --indice_folder "indice_folder"
It will copy the 100 best images for this query in the output folder.
Using the --num_results
or --threshold
may be helpful to refine the filter
Thanks to fast knn index, this can run in real time (<10ms) for large K values (100000), and in minutes for very large K values.
Clip back
Then run (output_folder is the output of clip index)
echo '{"example_index": "output_folder"}' > indices_paths.json
clip-retrieval back --port 1234 --indices-paths indices_paths.json
At this point you have a simple flask server running on port 1234 and that can answer these queries:
/indices-list
-> return a list of indices/knn-service
that takes as input:
{
"text": "a text query",
"image": "a base64 image",
"modality": "image", // image or text index to use
"num_images": 4, // number of output images
"indice_name": "example_index"
}
and returns:
[
{
"image": "base 64 of an image",
"text": "some result text"
},
{
"image": "base 64 of an image",
"text": "some result text"
}
]
This achieve low latency status (10ms). Throughput is about 100 query/s. For high throughput, using a grpc server is required.
For development
Either locally, or in gitpod (do export PIP_USER=false
there)
Setup a virtualenv:
python3 -m venv .env
source .env/bin/activate
pip install -U pip
pip install -e .
to run tests:
pip install -r requirements-test.txt
then
python -m pytest -v tests -s
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for clip_retrieval-2.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 18f3545aed5791dee218ccf3647741e064f5fc41dce72fad42297e12c2ab11ea |
|
MD5 | 66de47f3d631abdbd1ef84c3c6dd5fad |
|
BLAKE2b-256 | c0d4d53130c4ebd2e54a1e8c6ff75bef0e256f357f94a7951adb3adbb1fc877b |