Skip to main content

Package for image deduplication

Project description

Image Deduplicator (imagededup)

Build Status Docs codecov PyPI Version License

imagededup is a python package that simplifies the task of finding exact and near duplicates in an image collection.

This package provides functionality to make use of hashing algorithms that are particularly good at finding exact duplicates as well as convolutional neural networks which are also adept at finding near duplicates. An evaluation framework is also provided to judge the quality of deduplication for a given dataset.

Following details the functionality provided by the package:

  • Finding duplicates in a directory using one of the following algorithms:
  • Generation of encodings for images using one of the above stated algorithms.
  • Framework to evaluate effectiveness of deduplication given a ground truth mapping.
  • Plotting duplicates found for a given image file.

Detailed documentation for the package can be found at: https://idealo.github.io/imagededup/

imagededup is compatible with Python 3.8+ and runs on Linux, MacOS X and Windows. It is distributed under the Apache 2.0 license.

📖 Contents

⚙️ Installation

There are two ways to install imagededup:

  • Install imagededup from PyPI (recommended):
pip install imagededup
  • Install imagededup from the GitHub source:
git clone https://github.com/idealo/imagededup.git
cd imagededup
pip install .

🚀 Quick Start

In order to find duplicates in an image directory using perceptual hashing, following workflow can be used:

  • Import perceptual hashing method
from imagededup.methods import PHash
phasher = PHash()
  • Generate encodings for all images in an image directory
encodings = phasher.encode_images(image_dir='path/to/image/directory')
  • Find duplicates using the generated encodings
duplicates = phasher.find_duplicates(encoding_map=encodings)
  • Plot duplicates obtained for a given file (eg: 'ukbench00120.jpg') using the duplicates dictionary
from imagededup.utils import plot_duplicates
plot_duplicates(image_dir='path/to/image/directory',
                duplicate_map=duplicates,
                filename='ukbench00120.jpg')

The output looks as below:

The complete code for the workflow is:

from imagededup.methods import PHash
phasher = PHash()

# Generate encodings for all images in an image directory
encodings = phasher.encode_images(image_dir='path/to/image/directory')

# Find duplicates using the generated encodings
duplicates = phasher.find_duplicates(encoding_map=encodings)

# plot duplicates obtained for a given file using the duplicates dictionary
from imagededup.utils import plot_duplicates
plot_duplicates(image_dir='path/to/image/directory',
                duplicate_map=duplicates,
                filename='ukbench00120.jpg')

It is also possible to use your own custom models for finding duplicates using the CNN method.

For examples, refer this part of the repository.

For more detailed usage of the package functionality, refer: https://idealo.github.io/imagededup/

⏳ Benchmarks

Update: Provided benchmarks are only valid upto imagededup v0.2.2. The next releases have significant changes to all methods, so the current benchmarks may not hold.

Detailed benchmarks on speed and classification metrics for different methods have been provided in the documentation. Generally speaking, following conclusions can be made:

  • CNN works best for near duplicates and datasets containing transformations.
  • All deduplication methods fare well on datasets containing exact duplicates, but Difference hashing is the fastest.

🤝 Contribute

We welcome all kinds of contributions. See the Contribution guide for more details.

📝 Citation

Please cite Imagededup in your publications if this is useful for your research. Here is an example BibTeX entry:

@misc{idealods2019imagededup,
  title={Imagededup},
  author={Tanuj Jain and Christopher Lennan and Zubin John and Dat Tran},
  year={2019},
  howpublished={\url{https://github.com/idealo/imagededup}},
}

🏗 Maintainers

© Copyright

See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

imagededup-0.3.3.post1.tar.gz (121.0 kB view details)

Uploaded Source

Built Distributions

imagededup-0.3.3.post1-cp312-cp312-win_amd64.whl (134.6 kB view details)

Uploaded CPython 3.12Windows x86-64

imagededup-0.3.3.post1-cp312-cp312-win32.whl (131.7 kB view details)

Uploaded CPython 3.12Windows x86

imagededup-0.3.3.post1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (304.4 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

imagededup-0.3.3.post1-cp312-cp312-macosx_11_0_arm64.whl (134.5 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

imagededup-0.3.3.post1-cp311-cp311-win_amd64.whl (134.4 kB view details)

Uploaded CPython 3.11Windows x86-64

imagededup-0.3.3.post1-cp311-cp311-win32.whl (131.4 kB view details)

Uploaded CPython 3.11Windows x86

imagededup-0.3.3.post1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (299.1 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

imagededup-0.3.3.post1-cp311-cp311-macosx_11_0_arm64.whl (134.1 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

imagededup-0.3.3.post1-cp310-cp310-win_amd64.whl (134.1 kB view details)

Uploaded CPython 3.10Windows x86-64

imagededup-0.3.3.post1-cp310-cp310-win32.whl (131.3 kB view details)

Uploaded CPython 3.10Windows x86

imagededup-0.3.3.post1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (291.1 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

imagededup-0.3.3.post1-cp310-cp310-macosx_11_0_arm64.whl (134.1 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

imagededup-0.3.3.post1-cp39-cp39-win_amd64.whl (134.1 kB view details)

Uploaded CPython 3.9Windows x86-64

imagededup-0.3.3.post1-cp39-cp39-win32.whl (131.3 kB view details)

Uploaded CPython 3.9Windows x86

imagededup-0.3.3.post1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (290.2 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

imagededup-0.3.3.post1-cp39-cp39-macosx_11_0_arm64.whl (134.1 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file imagededup-0.3.3.post1.tar.gz.

File metadata

  • Download URL: imagededup-0.3.3.post1.tar.gz
  • Upload date:
  • Size: 121.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for imagededup-0.3.3.post1.tar.gz
Algorithm Hash digest
SHA256 ce6749dfcaeb1a4482db2342f506f0b7a1ffe773aaf768639e963d483ceea62a
MD5 0589a8c5779ae4199fc683e13c9e65b7
BLAKE2b-256 a83a9d491f14b2f5001eebc44d07ad02702f7fb1b93e9a1d769489a0ad030d5c

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post1-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 813fd9b4d1fe14acbda31fd7654b91d84007125aeace4cdc5e8675dfd639bd0f
MD5 09173b53070e9b09f5259e1b6ddb7f9f
BLAKE2b-256 1b765f5389a5f7817cb0e880cd320db96fa69ebbea9b2a67c8928ae8c2006059

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post1-cp312-cp312-win32.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post1-cp312-cp312-win32.whl
Algorithm Hash digest
SHA256 bdb77bf7c74109a0f7ced5686a008b8a9b9ed925234df446cd41365f4ffc690b
MD5 344be2abf88e7e17baae477026638ace
BLAKE2b-256 bbe75cc724354377087a89f02cced33abd366387bb1abad4c2b29ed6ac9cc6f3

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5b317e15c4509189ee69b932a7ae8ec35469c3f24942b0fe8038c800f5071533
MD5 7227194ed5f36422df989ef6169f3505
BLAKE2b-256 61336711cd6d7d176b7929d014aa926e8c65c199303704da2aad1e4b29707ae1

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9dc367810c0efcfbcd52913e40ae5bc79af10c3d8a7ed2a4170f41ccec0c7603
MD5 7ef7ad7cbc16f2e982093c9ce621a273
BLAKE2b-256 a7bcea347e087340b810dca232c64e9f6ca18f81df713dfb8fc2e0082e7b0e47

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post1-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 cb5d98d75cadf20db3ba7ab7e6267ab9d5693416ca07716a2d82bb1eb86fa904
MD5 cb805166e5e9c40030e4e9117d312a3d
BLAKE2b-256 ae8cc9042b003029cd681af379d85d58f35dce2e3b5c1856d2d42fdadb308078

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post1-cp311-cp311-win32.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post1-cp311-cp311-win32.whl
Algorithm Hash digest
SHA256 5a22e1cf563a0e9064dd8d1c5633bc88e6f99cafaaaa4fee5e5f9dfefeaed387
MD5 84dadea980ed6eebbc57e476e39941ed
BLAKE2b-256 ca7b8ca57ec64860c7aa769dbe4aa32978a35b1d815c8ce1bd9ab5e433374efc

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c96399ea50a4738f3adf4ab1107629fb109322875161dad8013756b223d96ac3
MD5 da3eec6ecfaf20e98871fde1a8e0b97d
BLAKE2b-256 7062eed68136b7a50311083b3f54867bf20c4510dd65ed8c063913b1372f9fbb

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 54a882440e0b7cecb5095889cde1f712b63be9694951bfc57680c641b55f6327
MD5 301a70c309df678cd4d5f53b434e3d45
BLAKE2b-256 a17c300084d5b1f953f005676b9ee7f214c647623a79f9923d41de4ca2316577

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post1-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 d5c847b0ea081f2ccbeeaa86820acba324d5c997511f4fb8d2902ac826bf9159
MD5 1006b2b35a307733dbe43acfea20cd7b
BLAKE2b-256 d3a93da93f591b0af02042173e86c21b85b5d93b5b96dcae1ae3ad50f0f5f7ef

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post1-cp310-cp310-win32.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post1-cp310-cp310-win32.whl
Algorithm Hash digest
SHA256 4623c9a4ab90a7d59019e7c14a0ffd56ef129a59f9867bd46078f28f03a46988
MD5 76b7e03d7d27e6ed83d7cabd4a0f0ffd
BLAKE2b-256 e54f1fc93ccda3659c746d40a18d0bdebade5450630519311f0a0f7c41286d1b

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3dcd2673824927db6139c543841aafab7272fb43380a6f35e31129c6da63b1a1
MD5 1e6889793f185920ba45bdb466d8daaa
BLAKE2b-256 cafd4f9e53bc1f98f3ea8092cd22d5c0e704f47ff9ece8e82f97de28f32c2281

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4a56d50811d93022d8574c9f2e41b1efef64b8e4802cda35cdc40274cbe66128
MD5 1e6cda67010ecd51e83d479e20f97990
BLAKE2b-256 cd93a816e9d039cfff76848dddf3c23ac0a841a517fbadbaf8b83836430b2c5a

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post1-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 73f3ea76f4b3855683c312d19b74946650a7776b57b58835a15b4a180362d2af
MD5 b0fe1664f62e527c96b1671bd87acbb5
BLAKE2b-256 babcb2cca5a14024b92e58b4304b932ed75b4dfcd13dc46ff9251e7508cca40e

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post1-cp39-cp39-win32.whl.

File metadata

  • Download URL: imagededup-0.3.3.post1-cp39-cp39-win32.whl
  • Upload date:
  • Size: 131.3 kB
  • Tags: CPython 3.9, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for imagededup-0.3.3.post1-cp39-cp39-win32.whl
Algorithm Hash digest
SHA256 6464cc4675f9c27e1371733452f13caf74d112369baeeb2950ea8cde7ef2a00d
MD5 98c6d402140cd330c50e5cebcef616b5
BLAKE2b-256 bd33881ecfb9bff47231263f29a1c41be7f634ab2164d960593a24a57fac2d39

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3b0725c9baf487f430d2bfb6e0b3d28e7cdbab15e712da068e739ec7a2d1eb26
MD5 f1a530b9bdbc8d828ddfa9e06768b682
BLAKE2b-256 7ef6303b085a32ed2f57995ce44db67c2fb958bfe8f18cd0a81bc00b277a04ce

See more details on using hashes here.

File details

Details for the file imagededup-0.3.3.post1-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for imagededup-0.3.3.post1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 29c4e01023b12eddf961a06eba9ff9a411db6230e26c1755cc1b1642d8d9ac56
MD5 1c8b57af651bab20e6dc533bdc72a587
BLAKE2b-256 1dfd4ea44705d1ca1c94f234eb0141badbb418c53d79cd845ca86a6151d1cac0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page