No project description provided
Project description
Finding Duplicate Images
Finds equal or similar images in a directory containing (many) image files.
Needs Python3 and Pillow imaging library to run, additionally Wand for the test suite.
Uses Poetry for dependency management.
Usage
$ pip install duplicate_images
$ find-dups -h
<OR JUST>
$ find-dups $IMAGE_ROOT
Image comparison algorithms
Use the --algorithm
option to select how equal images are found.
exact
: marks only binary exactly equal files as equal. This is by far the fasted, but most restricted algorithm.histogram
: checks the images' color histograms for equality. Faster than the image hashing algorithms, but tends to give a lot of false positives for images that are similar, but not equal. Use the--fuzziness
and--aspect-fuzziness
options to fine-tune its behavior.ahash
,colorhash
,dhash
andphash
: four different image hashing algorithms. See https://pypi.org/project/ImageHash for an introduction on image hashing and https://tech.okcupid.com/evaluating-perceptual-image-hashes-okcupid for some gory details which image hashing algorithm performs best in which situation. For a start I recommendahash
.
Development
Installation
From source:
$ git clone https://gitlab.com/lilacashes/DuplicateImages.git
$ cd DuplicateImages
$ pip3 install poetry
$ poetry install
Running
$ poetry run find-dups $PICTURE_DIR
or
$ poetry run find-dups -h
for a list of all possible options.
Testing
Running:
$ poetry run mypy duplicate_images tests
$ poetry run flake8
$ poetry run pytest
Publishing
$ poetry build
$ poetry publish --username $PYPI_USER --password $PYPI_PASSWORD --repository testpypi
$ poetry publish --username $PYPI_USER --password $PYPI_PASSWORD
Profiling
CPU time
To show the top functions by time spent, including called functions:
$ poetry run python -m cProfile -s tottime ./duplicate_images/duplicate.py \
--algorithm $ALGORITHM --action-equal none $IMAGE_DIR 2>&1 | head -n 15
or, to show the top functions by time spent in the function alone:
$ poetry run python -m cProfile -s cumtime ./duplicate_images/duplicate.py \
--algorithm $ALGORITHM --action-equal none $IMAGE_DIR 2>&1 | head -n 15
Memory usage
$ poetry run fil-profile run ./duplicate_images/duplicate.py \
--algorithm $ALGORITHM --action-equal none $IMAGE_DIR 2>&1
This will open a browser window showing the functions using the most memory (see https://pypi.org/project/filprofiler for more details).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file duplicate_images-0.2.0.tar.gz
.
File metadata
- Download URL: duplicate_images-0.2.0.tar.gz
- Upload date:
- Size: 7.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.0a1 CPython/3.8.5 Linux/5.4.0-58-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5babd8750ab763be308335cbe9ba4752a86816fa3c7baf11b57a65879df5ec57 |
|
MD5 | 98253494fe96e3d3f3a5dec3cb85aaaa |
|
BLAKE2b-256 | 3a35a0c7ac7f866eab0e7660a91c5fc873aab2de5761587c3cffb2b490a57ad3 |
File details
Details for the file duplicate_images-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: duplicate_images-0.2.0-py3-none-any.whl
- Upload date:
- Size: 8.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.0a1 CPython/3.8.5 Linux/5.4.0-58-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 907aab6ec2ae59bb5b6f0e58f5eb83f7540f715dbfc60b39466ec5f02da294c1 |
|
MD5 | 1da658abd64360e4c64684f2126a184c |
|
BLAKE2b-256 | cec6ecd3f0d36d273e7f3532ce2d807254ebda26f5bd19b583eb6363cfec69c9 |