Python package undouble
Project description
undouble
Python package undouble is to detect (near-)identical images.
The aim of undouble
is to detect (near-)identical images. It works using a multi-step process of pre-processing the
images (grayscaling, normalizing, and scaling), computing the image hash, and finding images that have image hash with
a maximum difference. A threshold of 0 will group images with an identical image hash. The results can easily be
explored by the plotting functionality and images can be moved with the move functionality. When moving images, the
image in the group with the largest resolution will be copied, and all other images are moved to the "undouble"
subdirectory. In case you want to cluster your images, I would recommend reading the blog and use the
clustimage library.
The following steps are taken in the undouble
library:
-
- Read recursively all images from directory with the specified extensions.
-
- Compute image hash.
-
- Group similar images.
-
- Move if desired.
Installation
- Install undouble from PyPI (recommended). undouble is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows.
- A new environment can be created as following:
conda create -n env_undouble python=3.8
conda activate env_undouble
pip install undouble # new install
pip install -U undouble # update to latest version
- Alternatively, you can install from the GitHub source:
# Directly install from github source
pip install -e git://github.com/erdogant/undouble.git@0.1.0#egg=master
pip install git+https://github.com/erdogant/undouble#egg=master
pip install git+https://github.com/erdogant/undouble
# By cloning
git clone https://github.com/erdogant/undouble.git
cd undouble
pip install -U .
Import undouble package
from undouble import Undouble
Example:
# Import library
from undouble import Undouble
# Init with default settings
model = Undouble(method='phash', hash_size=8)
# Import example data
targetdir = model.import_example(data='flowers')
# Importing the files files from disk, cleaning and pre-processing
model.preprocessing(targetdir)
# Compute image-hash
model.fit_transform()
# Find images with image-hash <= threshold
model.find(threshold=0)
# [undouble] >INFO> Store examples at [./undouble/data]..
# [undouble] >INFO> Downloading [flowers] dataset from github source..
# [undouble] >INFO> Extracting files..
# [undouble] >INFO> [214] files are collected recursively from path: [./undouble/data/flower_images]
# [undouble] >INFO> Reading and checking images.
# [undouble] >INFO> Reading and checking images.
# 100%|██████████| 214/214 [00:02<00:00, 96.56it/s]
# [undouble] >INFO> Extracting features using method: [phash]
# 100%|██████████| 214/214 [00:00<00:00, 3579.14it/s]
# [undouble] >INFO> Build adjacency matrix with phash differences.
# [undouble] >INFO> Extracted features using [phash]: (214, 214)
# 100%|██████████| 214/214 [00:00<00:00, 129241.33it/s]
#
# [undouble] >INFO> Number of groups with similar images detected: 3
# [undouble] >INFO> [3] groups are detected for [7] images.
# Plot the images
model.plot()
# Move the images
model.move()
References
Citation
Please cite in your publications if this is useful for your research (see citation).
Maintainers
- Erdogan Taskesen, github: erdogant
Contribute
- All kinds of contributions are welcome!
- If you wish to buy me a Coffee for this work, it is very appreciated :)
Licence
See LICENSE for details.
Other interesting stuf
- https://ourcodeworld.com/articles/read/1006/how-to-determine-whether-2-images-are-equal-or-not-with-the-perceptual-hash-in-python
- https://www.pyimagesearch.com/2017/11/27/image-hashing-opencv-python/
- https://github.com/JohannesBuchner/imagehash
- https://ourcodeworld.com/articles/read/1006/how-to-determine-whether-2-images-are-equal-or-not-with-the-perceptual-hash-in-python
- https://stackoverflow.com/questions/64994057/python-image-hashing
- https://towardsdatascience.com/how-to-cluster-images-based-on-visual-similarity-cd6e7209fe34
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file undouble-1.0.1.tar.gz
.
File metadata
- Download URL: undouble-1.0.1.tar.gz
- Upload date:
- Size: 15.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1441b2f784eeb91a9b555104f282d34b988f5118f41fe1b23e6923660a0a5aa0 |
|
MD5 | 11c6c6f3ba4a534f4195851d6cb957be |
|
BLAKE2b-256 | e519209dfa18ba24a31cb720c935c35ae8c569247b64241599209d7819c15566 |
File details
Details for the file undouble-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: undouble-1.0.1-py3-none-any.whl
- Upload date:
- Size: 15.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2514f18d2e5c1b8188bb9060af7a011ac8c4e797809aa2e69df0184a461d6800 |
|
MD5 | 802f3286c26e1a7b055f4292754b1737 |
|
BLAKE2b-256 | 74e1eb2d75276b679e6cf97d94e4282f3500607efbdae9945c4b1698f0a9b92f |