Skip to main content

Package for image deduplication

Project description

Image Deduplicator (imagededup)

Build Status Docs codecov PyPI Version License

imagededup is a python package that simplifies the task of finding exact and near duplicates in an image collection.

This package provides functionality to make use of hashing algorithms that are particularly good at finding exact duplicates as well as convolutional neural networks which are also adept at finding near duplicates. An evaluation framework is also provided to judge the quality of deduplication for a given dataset.

Following details the functionality provided by the package:

  • Finding duplicates in a directory using one of the following algorithms:
  • Generation of encodings for images using one of the above stated algorithms.
  • Framework to evaluate effectiveness of deduplication given a ground truth mapping.
  • Plotting duplicates found for a given image file.

Detailed documentation for the package can be found at: https://idealo.github.io/imagededup/

imagededup is compatible with Python 3.9+ and runs on Linux, MacOS X and Windows. It is distributed under the Apache 2.0 license.

📖 Contents

⚙️ Installation

There are two ways to install imagededup:

  • Install imagededup from PyPI (recommended):
pip install imagededup
  • Install imagededup from the GitHub source:
git clone https://github.com/idealo/imagededup.git
cd imagededup
pip install .

🚀 Quick Start

In order to find duplicates in an image directory using perceptual hashing, following workflow can be used:

  • Import perceptual hashing method
from imagededup.methods import PHash
phasher = PHash()
  • Generate encodings for all images in an image directory
encodings = phasher.encode_images(image_dir='path/to/image/directory')
  • Find duplicates using the generated encodings
duplicates = phasher.find_duplicates(encoding_map=encodings)
  • Plot duplicates obtained for a given file (eg: 'ukbench00120.jpg') using the duplicates dictionary
from imagededup.utils import plot_duplicates
plot_duplicates(image_dir='path/to/image/directory',
                duplicate_map=duplicates,
                filename='ukbench00120.jpg')

The output looks as below:

The complete code for the workflow is:

from imagededup.methods import PHash
phasher = PHash()

# Generate encodings for all images in an image directory
encodings = phasher.encode_images(image_dir='path/to/image/directory')

# Find duplicates using the generated encodings
duplicates = phasher.find_duplicates(encoding_map=encodings)

# plot duplicates obtained for a given file using the duplicates dictionary
from imagededup.utils import plot_duplicates
plot_duplicates(image_dir='path/to/image/directory',
                duplicate_map=duplicates,
                filename='ukbench00120.jpg')

To run the above snippet on Windows, have a look here. It is also possible to use your own custom models for finding duplicates using the CNN method.

For examples, refer this part of the repository.

For more detailed usage of the package functionality, refer: https://idealo.github.io/imagededup/

⏳ Benchmarks

Update: Provided benchmarks are only valid upto imagededup v0.2.2. The next releases have significant changes to all methods, so the current benchmarks may not hold.

Detailed benchmarks on speed and classification metrics for different methods have been provided in the documentation. Generally speaking, following conclusions can be made:

  • CNN works best for near duplicates and datasets containing transformations.
  • All deduplication methods fare well on datasets containing exact duplicates, but Difference hashing is the fastest.

🤝 Contribute

We welcome all kinds of contributions. See the Contribution guide for more details.

📝 Citation

Please cite Imagededup in your publications if this is useful for your research. Here is an example BibTeX entry:

@misc{idealods2019imagededup,
  title={Imagededup},
  author={Tanuj Jain and Christopher Lennan and Zubin John and Dat Tran},
  year={2019},
  howpublished={\url{https://github.com/idealo/imagededup}},
}

🏗 Maintainers

© Copyright

See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

imagededup-0.3.3.post2.tar.gz (120.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

imagededup-0.3.3.post2-cp312-cp312-win_amd64.whl (134.8 kB view details)

Uploaded CPython 3.12Windows x86-64

imagededup-0.3.3.post2-cp312-cp312-win32.whl (131.7 kB view details)

Uploaded CPython 3.12Windows x86

imagededup-0.3.3.post2-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (318.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

imagededup-0.3.3.post2-cp312-cp312-macosx_11_0_arm64.whl (133.7 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

imagededup-0.3.3.post2-cp311-cp311-win_amd64.whl (134.6 kB view details)

Uploaded CPython 3.11Windows x86-64

imagededup-0.3.3.post2-cp311-cp311-win32.whl (131.4 kB view details)

Uploaded CPython 3.11Windows x86

imagededup-0.3.3.post2-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (307.4 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

imagededup-0.3.3.post2-cp311-cp311-macosx_11_0_arm64.whl (133.3 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

imagededup-0.3.3.post2-cp310-cp310-win_amd64.whl (134.5 kB view details)

Uploaded CPython 3.10Windows x86-64

imagededup-0.3.3.post2-cp310-cp310-win32.whl (131.4 kB view details)

Uploaded CPython 3.10Windows x86

imagededup-0.3.3.post2-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (300.4 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

imagededup-0.3.3.post2-cp310-cp310-macosx_11_0_arm64.whl (133.2 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

imagededup-0.3.3.post2-cp39-cp39-win_amd64.whl (134.5 kB view details)

Uploaded CPython 3.9Windows x86-64

imagededup-0.3.3.post2-cp39-cp39-win32.whl (131.3 kB view details)

Uploaded CPython 3.9Windows x86

imagededup-0.3.3.post2-cp39-cp39-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (299.6 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

imagededup-0.3.3.post2-cp39-cp39-macosx_11_0_arm64.whl (133.2 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file imagededup-0.3.3.post2.tar.gz.

File metadata

  • Download URL: imagededup-0.3.3.post2.tar.gz
  • Upload date:
  • Size: 120.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for imagededup-0.3.3.post2.tar.gz
Algorithm Hash digest
SHA256 c074e33e74649539318eb975907001c284c0ff0dcdcb985dab14f4dc9f6e4894
MD5 2a30ca1e701df01bc881645a2f0473a9
BLAKE2b-256 5a13d66926f4af67f189537317b72bb21f4457c22616f503dd9fedf203219029

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post2-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post2-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 f52308cc83162c636a7a3c7a862fe703be34645bb8ec5eca986b2a2f030e82d5
MD5 2130a7f2f9d37e6b3b772cd11873ce70
BLAKE2b-256 62cd3d49b310099904667ac0adc1f73a4889c7a7e3aefdff5b1ea4cd58141767

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post2-cp312-cp312-win32.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post2-cp312-cp312-win32.whl
Algorithm Hash digest
SHA256 27b2196868e7d3db03f4abaa32855ef022afd8dc95fc8a3a66aafbef8ce0aa47
MD5 d5391856790f641d5c14f75f0d09d57f
BLAKE2b-256 37a143896158fed3b50d9f9727e10e159c9eb2293d211941ca1a87bebd2ee022

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post2-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post2-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f935e05a9015535aba7f4ea5de9d9555688a031b8aa412b98468dd4c837bffef
MD5 d12762ea4627dab01ee1aae2d40f71dd
BLAKE2b-256 57717032085abbdbf5c23efe53dad32a015cc6a53fcf1573e6cd881b2cc84043

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ed17066cdb5f6456384fa4914543b29b357c21238d82b4ffde89555ec1ada2bf
MD5 28613741b9e20ed5add082ba8ad95bcb
BLAKE2b-256 23f546e3970a5dd26cf0dc858fc1421055a243c7cf4cfd5818e2f4a893e59b02

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post2-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post2-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 df06c61e3faf7264e52894f17b35d9828e6cf58bac4cc8c6fa73fca7ad2452de
MD5 6d20165ebf88741105892762d859475c
BLAKE2b-256 538c0fa444e5f960db269e3779329d8377ce654e45066da7205f14b353c7bf82

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post2-cp311-cp311-win32.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post2-cp311-cp311-win32.whl
Algorithm Hash digest
SHA256 5fa29afac4e67aea1ab0efe34604ceb7137bf350366976e1f4fe1144a59d8a6a
MD5 8ced10e975cb98e32f65d7e6c6a8eb38
BLAKE2b-256 decd0b802f5f88636eb145f94682cc76b7010b50424e5cfd239294b358fd3389

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post2-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post2-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4a00e57e85a464d52e8a1285248a030fb59719d33bd26b6852af71b8d5bf4bd8
MD5 61f743b0d1ea77a51a6d582b0c15b0a8
BLAKE2b-256 0fb00fb65c3191ab1f253834473ba52652851c14fd4bff8c9d6a8cddd159c37a

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d5864b10060c2bd8dab539d1c38bcf5ace27b51a753f75c74c275aec0fdc57d2
MD5 37c0d3440e60106d0dbe6f99262bead2
BLAKE2b-256 6e8e140106813add8424c204c1bfbc4775d5d913cacfde295785fe8b9879319f

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post2-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post2-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 86ec42780625cb8405d3a0e23b2a0b009e6f1c8e0fed85f46981d8532e006176
MD5 30bb851e8894cdea10674c8b7e6dfceb
BLAKE2b-256 bd4bdc035454ed4443f24bf03f4753cac310f221b8060b875d50b7fe8689feb6

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post2-cp310-cp310-win32.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post2-cp310-cp310-win32.whl
Algorithm Hash digest
SHA256 dff7faf7e1b1c770c8300a68af8cad6dc5b4a0dbe8cfaad730ea11d4e0535de6
MD5 605d2c65a230d4993c9b9ea69276a91d
BLAKE2b-256 4438180441628e3f340034da71de3e120e97bcdf5b511b36a18ee73211426e46

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post2-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post2-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b1b9a6a15ccd0b2624cb18f414113906e397de01b190d16e78492b0971530ad6
MD5 8fdf58e1067011c0e49fcb7d0bb2b2a5
BLAKE2b-256 d1c0a2ea75a0db2fa3dceb89777d1bac8c5932427ad711cbd8e40d63e47ac6e1

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post2-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post2-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ab728d0e44725cda13b6bd0b2b085cc741e334367e62031938d54e4b0225951b
MD5 383557b5f82f10d6388a3550d95da5dd
BLAKE2b-256 dfb1b112156c3e88f25e94d582453c3403a472b8df30aa0df173fe36a56cf037

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post2-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post2-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 033d8c3e01e91f4c21db404ac12904d759ece059cc357d8d2a7aef1b39b3b9a4
MD5 581520af9607dc7243ceea6ef5439226
BLAKE2b-256 eaab035132cc3572e08fb0d26467c84336a82c7e2fc52dc65e9a8d6baa18712c

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post2-cp39-cp39-win32.whl.

File metadata

  • Download URL: imagededup-0.3.3.post2-cp39-cp39-win32.whl
  • Upload date:
  • Size: 131.3 kB
  • Tags: CPython 3.9, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for imagededup-0.3.3.post2-cp39-cp39-win32.whl
Algorithm Hash digest
SHA256 908b273d0c1f3b97551478b3b81a48330e9f581dc26bc66b88604f538e75f017
MD5 a59c078d14b35252ecf2f0a6e448ba4e
BLAKE2b-256 35ac52a01ce9ec132c133767f4dad12371cd29cd802dfc094d61b5e0d31e9291

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post2-cp39-cp39-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post2-cp39-cp39-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 cfd3d3c24beab2dee61042e972b9f04a2543a94e309b8220494e473897f144ee
MD5 2939b00e9113477add2a14baaa496fa1
BLAKE2b-256 d48ef02f470077c752a02371ec95dd3820189d0add9900ecfa7deead94e6aea9

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post2-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post2-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e6f98cfe7e2c89c183bf98263dab050701697775463608cf3edf3f8365cec13b
MD5 0de9fc3e3d4462c84b7ff5c6f836fd4a
BLAKE2b-256 fdfa9cb88e0ef798425ee0db356a3350a378c9c120a8a48a96ee22635a2dbf61

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page