vasrch

Searching related data by image

These details have not been verified by PyPI

Project description

Product Image Search using K-Means Clustering

This approach is suitable for e-commerce platforms where you want to return similar products that looks the same as the searched product. The approach can be used on variety of use cases and so it is implemented in such a way you can use it for whatever case you have that involves finding similar images from a collection of images given an image input. The resulted models are lightweight and can be deployed on CPU instances with an approximately less than 500ms response time. Training the model depends on how large your dataset is. For data collections with less or equal to 5000 images, training on CPU is okay, more than that, a GPU is recommended for faster training times.

I used ImageNet for extracting image features by the time of releasing this. The plan is to explore other base image models and add them to package where you'll have to pass them as an argument/parameter when calling the extract_features function.

PS: The purpose of the method to get optimal number of clusters is to give you a visualization so you can choose yourself which number of cluster is suitable for your dataset. So it won't choose one for you (this may change later) for now you will have to select a number yourself based on what you see from the Elbow and Silhoutte plots.

Bare-metal implementation of the product image search using K-Means Clustering

Steps:

Extract features from images using a pretrained model (VGG, ResNet, etc.)
Find optimal number of clusters using either Elbow or Silhoutte method.
Train a K-means clusering model on the features to group features into different clusters. Once training is complete, you will have the clustering model and a csv file containing cluster assignments information
Use the model and the csv file to predict and retrieve similar images given a test image.

Using the Vasrch Library

There are four (4) callable methods:

extract_features method which takes in the following arguments
- img_folder which is the the folder containing the images you want to train on, this being your image database where you'll want to retrieve search results from.
- save_to which is the folder name to where you want the features to be saved. I used VGG16 as our base model where we get the features at the block5_pool layer right before the classification layer.
get_optimal_num_clusters method which takes in the following arguments
- features_folder which is the folder to where you once saved the extracted features.
- max_clusters which is the maximum number of clusters you want to test on.
- n_components which is the number of components to be used by the elbow and silhoutte methods. The method will show results from Elbow and Silhoutte method altogether on an plot. This will guide you upon choosing the number of clusters to train on. If you haven't read about Elbow and Silhoutte methods for finding optimum number of clusters in clustering algorithms please do.
train_clusters method which is the main training function and takes in the following arguments
- features_folder you guessed it, the folder where we saved our extracted features.
- model_filename the name you want your model to be saved in. The model will be saved as a pickel file.
- csv_filename the name you want your image names and cluster assignments information (metadata) to be saved in. This will be useful when searching images later on, and even integrating to your app.
- num_clusters the number of clusters you want your model to be trained on.
search_similar_images method which returns a cluster with similar images to the given image. The arguments are:
- image_path which is the path to the image you wanna search.
- model_filename which is the name of your trained model.
- csv_file which is the name of the metadata csv saved during training.
- top_n which is the number of image results you want to be returned.

Sample code on using the library

from varsitysrch import VaSrch

search = VaSrch()

image_path = 'test_image.jpg'
image_folder = './images_folder_path'
features_folder = './features_folder_path'
num_clusters = 20  #choose this number based on elbow and silhoutte plots
csv_file = "metadata.csv"
model_filename = "test_model.pkl"
top_n = 5

search.extract_features(image_folder, features_folder)
visualize_clusters = search.get_optimal_num_clusters(features_folder, max_clusters=100, n_components=10)
search.train_clusters(features_folder, model_filename, csv_file, num_clusters)

similar_images = search.search_similar_images(image_path, model_filename, csv_file, top_n)
print(f"Similar images to {image_path} are:")
for image in similar_images:
    print(image)

If you encounter any issues please open an issue and I will do my best to reach out. If you want to contribute to the project just fork the repo, do your things and send a pull request.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.0.11

Dec 24, 2024

This version

0.0.10

Dec 21, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vasrch-0.0.10.tar.gz (6.0 kB view details)

Uploaded Dec 21, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vasrch-0.0.10-py3-none-any.whl (6.1 kB view details)

Uploaded Dec 21, 2024 Python 3

File details

Details for the file vasrch-0.0.10.tar.gz.

File metadata

Download URL: vasrch-0.0.10.tar.gz
Upload date: Dec 21, 2024
Size: 6.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.10.12

File hashes

Hashes for vasrch-0.0.10.tar.gz
Algorithm	Hash digest
SHA256	`e52eafe1b8b06afa5eee4fcabf776fe893d44b45a005c24fc4753876eeeeeca8`
MD5	`3ed404e312c67e610d953efc8a0444aa`
BLAKE2b-256	`97b558e1c00a01d5b54b0d8cf54fddfd2823f4304dd92803ea72e9138a0a3489`

See more details on using hashes here.

File details

Details for the file vasrch-0.0.10-py3-none-any.whl.

File metadata

Download URL: vasrch-0.0.10-py3-none-any.whl
Upload date: Dec 21, 2024
Size: 6.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.10.12

File hashes

Hashes for vasrch-0.0.10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`11f697f39c147f74c1abea06750e008584e5d824811c8940b4e5ab4eaf3b2d01`
MD5	`8acb9a74b95121ca82be0cfb03dfe2dc`
BLAKE2b-256	`5472d74271a7e6c3bd30d2300d58ac34d97e4b6d5b253db4624499ec7ac14294`

See more details on using hashes here.

vasrch 0.0.10

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Product Image Search using K-Means Clustering

Bare-metal implementation of the product image search using K-Means Clustering

Steps:

Using the Vasrch Library

Sample code on using the library

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes