Skip to main content

Image deduplicator using CNN, Cosine Similarity, Image Hashing, Structural Similarity Index Measurement, and Euclidean Distance

Project description


Antidupe

Image deduplicator using CNN, Cosine Similarity, Image Hashing, Structural Similarity Index Measurement, and Euclidean Distance

Installation

You can install Antidupe using pip:

pip install antidupe

Usage

Basic Usage

from antidupe import Antidupe
from PIL import Image

# Initialize Antidupe
antidupe = Antidupe()

# Load images (as numpy arrays or PIL.Image objects)
image1 = Image.open("image1.jpg")
image2 = Image.open("image2.jpg")

# Check for duplicates
is_duplicate = antidupe.predict([image1, image2])

if is_duplicate:
    print("Duplicate images detected!")
else:
    print("Images are not duplicates.")

Customizing Thresholds

You can customize the similarity thresholds for each technique during runtime or initialization:

  • Note that negative values will disable the measurement layer.
# Initialize Antidupe with custom thresholds
custom_thresholds = {
    'ih': 0.2,    # Image Hash
    'ssim': 0.2,  # SSIM
    'cs': 0.2,    # Cosine Similarity
    'cnn': 0.2,   # CNN
    'dedup': 0.1 # Mobilenet
}
antidupe = Antidupe(limits=custom_thresholds)

# Check for duplicates
is_duplicate = antidupe.predict([image1, image2])

Debugging

You can enable debug mode to print debugging messages:

# Initialize Antidupe with debug mode enabled
antidupe = Antidupe(debug=True)

# Check for duplicates
is_duplicate = antidupe.predict([image1, image2])

Changing Limits During Runtime

You can change the similarity thresholds during runtime:

# Set new limits during runtime
new_thresholds = {
    'ih': 0.1,
    'ssim': 0.1,
    'cs': 0.1,
    'cnn': 0.1,
    'dedup': 0.15
}
antidupe.set_limits(limits=new_thresholds)

# Check for duplicates
is_duplicate = antidupe.predict([image1, image2])

Requirements

  • Python 3.x
  • SSIM PIL
  • ImageDeDup
  • NumPy
  • MatPlotLib
  • Pillow
  • ImageHash
  • Torch
  • Efficientnet Pytorch
  • TorchVision

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

antidupe-0.0.8.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

antidupe-0.0.8-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file antidupe-0.0.8.tar.gz.

File metadata

  • Download URL: antidupe-0.0.8.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for antidupe-0.0.8.tar.gz
Algorithm Hash digest
SHA256 a897f80942405bb215e2c012101438f494612c1c8623c14de972a05263492556
MD5 431fe85b8bae524c5b9cbd9528d04065
BLAKE2b-256 7eba008048fdaf204acc3c2516327f1468f2bb9ad1dab30f9536add408573e89

See more details on using hashes here.

File details

Details for the file antidupe-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: antidupe-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for antidupe-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 2221b557d1a08f7a826da898c257fbf3c422c2774cf91fb6a9b24345a359d9f4
MD5 b0476b84bd2dbfcbb1c3e2ab80affac3
BLAKE2b-256 c1915cc7ac664796d7844cd37e02d1539851d9e20afeee0de8125835e38d7894

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page