Skip to main content

No project description provided

Project description

Finding Duplicate Images

Finds equal or similar images in a directory containing (many) image files.

Needs Python3 and Pillow imaging library to run, additionally Wand for the test suite.

Uses Poetry for dependency management.

Usage

$ pip install duplicate_images
$ find-dups -h
<OR JUST>
$ find-dups $IMAGE_ROOT 

Image comparison algorithms

Use the --algorithm option to select how equal images are found.

  • exact: marks only binary exactly equal files as equal. This is by far the fasted, but most restricted algorithm.
  • histogram: checks the images' color histograms for equality. Faster than the image hashing algorithms, but tends to give a lot of false positives for images that are similar, but not equal. Use the --fuzziness and --aspect-fuzziness options to fine-tune its behavior.
  • ahash, colorhash, dhash and phash: four different image hashing algorithms. See https://pypi.org/project/ImageHash for an introduction on image hashing and https://tech.okcupid.com/evaluating-perceptual-image-hashes-okcupid for some gory details which image hashing algorithm performs best in which situation. For a start I recommend ahash.

Development

Installation

From source:

$ git clone https://gitlab.com/lilacashes/DuplicateImages.git
$ cd DuplicateImages
$ pip3 install poetry
$ poetry install

Running

$ poetry run find-dups $PICTURE_DIR

or

$ poetry run find-dups -h

for a list of all possible options.

Testing

Running:

$ poetry run mypy duplicate_images tests
$ poetry run flake8
$ poetry run pytest

Publishing

$ poetry build
$ poetry publish --username $PYPI_USER --password $PYPI_PASSWORD --repository testpypi
$ poetry publish --username $PYPI_USER --password $PYPI_PASSWORD

Profiling

CPU time

To show the top functions by time spent, including called functions:

$ poetry run python -m cProfile -s tottime ./duplicate_images/duplicate.py \ 
    --algorithm $ALGORITHM --action-equal none $IMAGE_DIR 2>&1 | head -n 15

or, to show the top functions by time spent in the function alone:

$ poetry run python -m cProfile -s cumtime ./duplicate_images/duplicate.py \ 
    --algorithm $ALGORITHM --action-equal none $IMAGE_DIR 2>&1 | head -n 15

Memory usage

$ poetry run fil-profile run ./duplicate_images/duplicate.py \
    --algorithm $ALGORITHM --action-equal none $IMAGE_DIR 2>&1

This will open a browser window showing the functions using the most memory (see https://pypi.org/project/filprofiler for more details).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duplicate_images-0.2.0.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

duplicate_images-0.2.0-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file duplicate_images-0.2.0.tar.gz.

File metadata

  • Download URL: duplicate_images-0.2.0.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.0a1 CPython/3.8.5 Linux/5.4.0-58-generic

File hashes

Hashes for duplicate_images-0.2.0.tar.gz
Algorithm Hash digest
SHA256 5babd8750ab763be308335cbe9ba4752a86816fa3c7baf11b57a65879df5ec57
MD5 98253494fe96e3d3f3a5dec3cb85aaaa
BLAKE2b-256 3a35a0c7ac7f866eab0e7660a91c5fc873aab2de5761587c3cffb2b490a57ad3

See more details on using hashes here.

File details

Details for the file duplicate_images-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: duplicate_images-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 8.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.0a1 CPython/3.8.5 Linux/5.4.0-58-generic

File hashes

Hashes for duplicate_images-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 907aab6ec2ae59bb5b6f0e58f5eb83f7540f715dbfc60b39466ec5f02da294c1
MD5 1da658abd64360e4c64684f2126a184c
BLAKE2b-256 cec6ecd3f0d36d273e7f3532ce2d807254ebda26f5bd19b583eb6363cfec69c9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page