Skip to main content

Simple duplication finder for Images, matches on names and then compares image hashes.

Project description

ImageDuplicationFinder

Finds duplicated images in Folders. It finds duplicated images first matched on the name. So both images have to have the same name. After a match is found based on the name, it will compare image hashes to make sure the both images are identical!


Usecase: You have multiple hard-drives, which all contain pictures. But are messy copied (for example after recovery). Copy both hard drives on a single one, or remove every duplicated picture from your hard-drives.

Installation


pip install ImageDuplicationFinder

or just clone this repository and run

pip install .

Overview


  • There are 3 stages :

  • 0:Syntax match (find identical names),

  • 1:Semantic match (compare images based on the pixelvalue)

  • 2:Deletion (delete Syntax AND Semantic matches)

  • If you only want to check for duplication, use -csv flag, it will print out a csv file with found dupications at the destination path given (skipping deletion stage)

  • This programm will remove all duplicates from path1 AND path2! If you have duplications in the path1 folder, they will be found!

  • This program is designed for big workloads (> 1tb ) in mind, it supports multithreading for speedup (will spawn as many threads as cores) and log the process to

  • This program will output a log file on the log position, will create a logfile at default (duplication.log)

  • deletion is made at the very end, so if you break in comparison-stage, you wont delete anything

Features in progress


  • make Syntax matching optional (use lvl parameter)
  • copy all data to a destination folder after duplication removal

Formates


Images and junk are destinct by formates, (only matters if run with the remove-other option) :

  • Not junk: ('.wav', '.mp3', '.png', '.jpg', '.jpeg', '.gif', '.tiff', '.psd','.bmp', '.eps', '.ai', '.indd', '.raw', '.webm', '.mkv', '.flv', '.vob', '.ogv', '.ogg', '.drc', '.gif', '.mng', '.avi', '.mts', '.m2ts', '.ts', '.mov', '.qt', '.wmv', '.yuv', '.rm', '.rmvb', '.viv', '.asf', '.mp4', '.m4p', '.m4v', '.mpg', '.mp2', '.mpeg', '.mpe', '.mpv', '.mpg', '.mpeg', '.m2v', '.m4v', '.svi','.3gp', '.3g2', '.mxf', '.roq', '.nsv', '.flv', '.f4v', '.f4p','.f4a', '.f4b', '.doc', '.pdf', '.docx', '.docm', '.dot', '.odt', '.rtf', '.txt', '.csv', '.dif', '.xls')

  • Images: (".png", ".jpg", ".jpeg", '.gif')

Usage


idf -h

positional arguments:
  path1                 original path or list of paths
  path2                 path to check and optinal delete duplicates in

optional arguments:
  -h, --help            show this help message and exit
  -l LOG_FILE, --log-file LOG_FILE
                        path of the log file to be written, defaults to duplicates.log in current folder
  -o OUTPUT_CSV, --output-csv OUTPUT_CSV
                        ouputs csv list of duplicates
  -d, --delete          automatically deletes duplicates
  -t, --threading       use multithreading to help speedup the process
  -ts IMAGEHASH_THRESHOLD, --imagehash-threshold IMAGEHASH_THRESHOLD
                        if not used -a how much simularity must be on the imagehash of the pictues (values will be interpreted as percent) default is 100
  -rem, --remove-other  removes other files, that are not considerd documents (good if there is a lot of junk) only works with -d

or use as python function

from image_duplicate_finder import find_duplicates


find_duplicates(path1, path2, csv = None, delete = False, t = False, ts = 100, lvl = 1, remove_others = False)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ImageDuplicateFinder-0.6.0.tar.gz (11.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ImageDuplicateFinder-0.6.0-py3-none-any.whl (14.3 kB view details)

Uploaded Python 3

File details

Details for the file ImageDuplicateFinder-0.6.0.tar.gz.

File metadata

  • Download URL: ImageDuplicateFinder-0.6.0.tar.gz
  • Upload date:
  • Size: 11.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for ImageDuplicateFinder-0.6.0.tar.gz
Algorithm Hash digest
SHA256 a2a55314b8e2f1b23810e940c25e4559fa25bf6a5d476911b6be192f3b566f47
MD5 3df81d13f3e86dc615864213409a547e
BLAKE2b-256 b244f5cd82500bcc7e1229999300da96f10e82a3ad3cdfe18e0911be219e8b90

See more details on using hashes here.

File details

Details for the file ImageDuplicateFinder-0.6.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ImageDuplicateFinder-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5a381c00624cc1f3f7d34f2dd56d0a05ab5a02243e4a8721fa15ec9f933fa4a6
MD5 4fe8d1bd18f63c272690f30cbe3ad123
BLAKE2b-256 c0ee130bcfbbee1f867b67e46c61952425613b0a22ef3ad619c34ad451ca96c3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page