Skip to main content

Image deduplicator using CNN, Cosine Similarity, Image Hashing, Structural Similarity Index Measurement, and Euclidean Distance

Project description


Antidupe

Image deduplicator using CNN, Cosine Similarity, Image Hashing, Structural Similarity Index Measurement, and Euclidean Distance

Installation

You can install Antidupe using pip:

pip install antidupe

Usage

Basic Usage

from antidupe import Antidupe
from PIL import Image

# Initialize Antidupe
antidupe = Antidupe()

# Load images (as numpy arrays or PIL.Image objects)
image1 = Image.open("image1.jpg")
image2 = Image.open("image2.jpg")

# Check for duplicates
is_duplicate = antidupe.predict([image1, image2])

if is_duplicate:
    print("Duplicate images detected!")
else:
    print("Images are not duplicates.")

Customizing Thresholds

You can customize the similarity thresholds for each technique during runtime or initialization:

# Initialize Antidupe with custom thresholds
custom_thresholds = {
    'ih': 0.2,    # Image Hash
    'ssim': 0.2,  # SSIM
    'cs': 0.2,    # Cosine Similarity
    'cnn': 0.2,   # CNN
    'dedup': 0.85 # Mobilenet
}
antidupe = Antidupe(limits=custom_thresholds)

# Check for duplicates
is_duplicate = antidupe.predict([image1, image2])

Debugging

You can enable debug mode to print debugging messages:

# Initialize Antidupe with debug mode enabled
antidupe = Antidupe(debug=True)

# Check for duplicates
is_duplicate = antidupe.predict([image1, image2])

Changing Limits During Runtime

You can change the similarity thresholds during runtime:

# Set new limits during runtime
new_thresholds = {
    'ih': 0.1,
    'ssim': 0.1,
    'cs': 0.1,
    'cnn': 0.1,
    'dedup': 0.8
}
antidupe.set_limits(limits=new_thresholds)

# Check for duplicates
is_duplicate = antidupe.predict([image1, image2])

Requirements

  • Python 3.x
  • SSIM PIL
  • ImageDeDup
  • NumPy
  • MatPlotLib
  • Pillow
  • ImageHash
  • Torch
  • Efficientnet Pytorch
  • TorchVision

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

antidupe-0.0.7.tar.gz (9.8 kB view details)

Uploaded Source

Built Distribution

antidupe-0.0.7-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file antidupe-0.0.7.tar.gz.

File metadata

  • Download URL: antidupe-0.0.7.tar.gz
  • Upload date:
  • Size: 9.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for antidupe-0.0.7.tar.gz
Algorithm Hash digest
SHA256 0fa1afc4a4d337a52160d9d6fb813434252485affed7e4ec8135c65724176c54
MD5 02a3c69350ae157fcc65cbb21a4c28f4
BLAKE2b-256 6768d12ed5ca9cc22e9609aac3053d4d73fe6672e74fad42e3d0e5bd3b7ad63f

See more details on using hashes here.

File details

Details for the file antidupe-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: antidupe-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for antidupe-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 2955801b214fbe2214718444abbfeb76eaee415219a56a2d9ee60f7b75ee57cf
MD5 cc357788c6846b85fa84a24ec0cbca99
BLAKE2b-256 35bf344271c2570f5622c2af8df2e146fc37c9ff548efacfcb0ceba1274a25fa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page