Skip to main content

A Python library for image similarity analysis using Image Encoders and Neural Networks

Project description

pyvisim

License Version Status Python Contributions

Welcome to pyvisim!

pyvisim is a Python library for computing image similarities using the encoders Fisher Vectors, VLAD and the Siamese Neural Networks.

Table of Contents

  1. Why pyvisim
  2. Installation
  3. Pretrained Models
  4. Contributing
  5. Get in Touch
  6. TODO
  7. License
  8. References

Why pyvisim?

pyvisim is designed to provide a simple and efficient way to compare images.

Quick Start

With just a few lines of code, you can compute the similarity score between two images using the VLAD encoder:

Example: Compute Similarity Score Using VLAD

from pyvisim.encoders import VLADEncoder, KMeansWeights
from pyvisim.datasets import OxfordFlowerDataset

# Load images from the Oxford Flower Dataset. Has to be NumPy Images!
dataset = OxfordFlowerDataset()
image1, *_  = dataset[0]
image2, *_ = dataset[1]

# Initialize the VLAD encoder with SIFT features and pretrained KMeans weights
encoder = VLADEncoder(
    weights=KMeansWeights.OXFORD102_K256_ROOTSIFT
)

# Compute the similarity score. By default, cosine similarity is used.
similarity_score = encoder.similarity_score(image1, image2)

print(f"Similarity Score: {similarity_score}")

You can also visit the introduction notebook for more examples.

I also provided various notebooks for different use-cases. Feel free to check them out, and let me know if you have any suggestions or questions!

  1. Image Retrieval
    Retrieve the top-k most similar images from a dataset.

    • Use encoding methods like VLAD or Fisher Vectors to quickly find the most relevant matches. Please visit this juptyer notebook for an example.
    • Example use: Building a fast image search engine for photo management software.
  2. Deep Learning Embeddings

    • Generate VLAD or Fisher vectors from neural network embeddings, e.g., VGG16 or other models.
    • Enhance your deep learning pipeline by leveraging traditional encoding methods on top of CNN features.
  3. Image Clustering

    • Cluster images based on their similarities to group them by category or content. An example and benchmarking can be found in this notebook.
    • Useful for organizing unlabeled data or generating pseudo-labels for further training.
  4. Pipeline for Combining Multiple Encoders

    • Chain various encoders in a single pipeline. An example can be found in this notebook.
    • Achieve more robust similarity metrics by blending different feature representations.
  5. Siamese Network (Coming Soon!)

    • Train a neural network to learn a similarity function directly from pairs/triples of images.
    • Possible use cases include face recognition, signature verification, or any image-based identity matching.

Installation

To use the library, you can simply install it via pip:

pip install pyvisim

or clone the repository and install it locally:

git clone https://github.com/MechaCritter/Python-Visual-Similarity.git
cd Python-Visual-Similarity
pip install .

Note that the notebooks are only available if you clone the repository.

All experiments in this project was made on the Oxford Flower Dataset [7], for which I have created a custom dataset class. To use this class, import it as follows:

from pyvisim.datasets import OxfordFlowerDataset

For more details on the dataset, please refer to the documentation.

Pretrained Models

The following pretrained models are provided for clustering and dimensionality reduction. All clustering models were trained with k=256. The choice of k was made arbitrarily based on the paper 5, where the authors tested with k=32, 64, 128, 256, 512, and so on. Since higher values would take too long, I chose k=256 as a balance between performance and computational cost.

KMeans Models

You can access these weights by importing KMWeights from the pyvisim.encoders module.

Model Name Features Extracted From PCA Applied Feature Dimensions
OXFORD102_K256_VGG16_PCA Last Conv Layer (VGG16) Yes 257
OXFORD102_K256_VGG16 Last Conv Layer (VGG16) No 514
OXFORD102_K256_ROOTSIFT_PCA RootSIFT features Yes 64
OXFORD102_K256_ROOTSIFT RootSIFT features No 128
OXFORD102_K256_SIFT_PCA SIFT features Yes 64
OXFORD102_K256_SIFT SIFT features No 128

Gaussian Mixture Models (GMMWeights)

You can access these weights by importing GMMWeights from the pyvisim.encoders module.

Model Name Features Extracted From PCA Applied Feature Dimensions
OXFORD102_K256_VGG16_PCA Last Conv Layer (VGG16) Yes 257
OXFORD102_K256_VGG16 Last Conv Layer (VGG16) No 514
OXFORD102_K256_ROOTSIFT_PCA RootSIFT features Yes 64
OXFORD102_K256_ROOTSIFT RootSIFT features No 128
OXFORD102_K256_SIFT_PCA SIFT features Yes 64
OXFORD102_K256_SIFT SIFT features No 128

Notes

  1. Feature Extraction:
    • Deep Features (VGG16): Feature maps from the last convolutional layer of VGG16. At each spatial location, the relative x and y coordinates are concatenated to the feature vector, resulting in 512 + 2 = 514 dimensions 6.
    • SIFT: Scale-Invariant Feature Transform descriptors, which was the original feature used for VLAD and Fisher Vector encoding 5.
    • RootSIFT: A variant of SIFT with Hellinger kernel normalization4.
  2. Dimensionality Reduction:
    • Models with _PCA in their names apply PCA to reduce the feature dimensions to by half.
    • The clustering models will learn from the transformed features after PCA is applied.

Contributing

We love contributions of all kinds—whether it’s suggesting new features, fixing bugs, or writing docs! Here’s how you can get involved:

  1. Fork this repository.
  2. Create a new branch for your changes.
  3. Open a pull request with a clear description of your idea or fix.

We welcome all feedback and hope to build a supportive community around pyvisim!

Get in Touch

If you have any questions or just want to say hi, feel free to:

TODO

The features below are planned for future releases:

  • Implement the siamese network.
  • Add tensor sketch approximation and mutual information analysis for Fisher Vector, according to this paper by Weixia Zhang, Jia Yan, Wenxuan Shi, Tianpeng Feng, and Dexiang Deng 1
  • Add support for vision transformers for the DeepConvFeature class.

You are welcome to implement any of these features or suggest new ones!

License

This project is licensed under the terms of the MIT license.

References

[1] Weixia Zhang, Jia Yan, Wenxuan Shi, Tianpeng Feng, and Dexiang Deng, "Refining Deep Convolutional Features for Improving Fine-Grained Image Recognition," EURASIP Journal on Image and Video Processing, 2017.
[2] Relja Arandjelović and Andrew Zisserman, 'All About VLAD', Department of Engineering Science, University of Oxford.
[3] E. Spyromitros-Xioufis, S. Papadopoulos, I. Kompatsiaris, G. Tsoumakas, and I. Vlahavas, "An Empirical Study on the Combination of SURF Features with VLAD Vectors for Image Search," Informatics and Telematics Institute, Center for Research and Technology Hellas, Thessaloniki, Greece; Department of Informatics, Aristotle University of Thessaloniki, Greece.
[4] Relja Arandjelović and Andrew Zisserman, "Three things everyone should know to improve object retrieval," Department of
Engineering Science, University of Oxford.
[5] Hervé Jégou, Florent Perronnin, Matthijs Douze, Jorge Sánchez, Patrick Pérez, and Cordelia Schmid, "Aggregating Local Image Descriptors into Compact Codes," IEEE.
[6] Liangliang Wang and Deepu Rajan, "An Image Similarity Descriptor for Classification Tasks," J. Vis. Commun. Image R., vol. 71, pp. 102847, 2020.
[7] Oxford Flower Dataset.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyvisim-0.1.2rc0.tar.gz (53.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyvisim-0.1.2rc0-py3-none-any.whl (57.5 MB view details)

Uploaded Python 3

File details

Details for the file pyvisim-0.1.2rc0.tar.gz.

File metadata

  • Download URL: pyvisim-0.1.2rc0.tar.gz
  • Upload date:
  • Size: 53.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.10

File hashes

Hashes for pyvisim-0.1.2rc0.tar.gz
Algorithm Hash digest
SHA256 966d02fdbc2b55a11235155b5ce7b5e9a2c6f7646d39b19caa48be0df64fb47b
MD5 00edf8e033fe43d2d3d7ae7c813d5a4f
BLAKE2b-256 f9a95dbea76f59bf29444fc77ed92b769a8f8bba7993fefd07e72026e4bab71d

See more details on using hashes here.

File details

Details for the file pyvisim-0.1.2rc0-py3-none-any.whl.

File metadata

  • Download URL: pyvisim-0.1.2rc0-py3-none-any.whl
  • Upload date:
  • Size: 57.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.10

File hashes

Hashes for pyvisim-0.1.2rc0-py3-none-any.whl
Algorithm Hash digest
SHA256 03a74cd5495ec5e1ba95b0f00ae6d8ef71066be113e870066324cf8546ad6b5e
MD5 7493a8df8ab371587cb7f786e8bc7daf
BLAKE2b-256 1132c342e0ad09a3e8489adfaece24c13d71b37d966c516750a3436cab9d1874

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page