Find near-duplicate and exact-duplicate images using perceptual hashing
Project description
Image Duplicates Detective (imgduptective)
Find near-duplicate and exact-duplicate images in your photo collections using perceptual hashing.
How it works
imgduptective uses a gradient-based horizontal difference hash (dhash) to create a perceptual fingerprint of each image. Images that look similar will have similar hashes, even if the files differ in format, resolution, or compression. A hamming distance threshold controls how similar two images must be to count as duplicates.
Results are cached in a local SQLite database (~/.config/imgduptective/) so subsequent runs are fast — only new or modified files are processed.
Installation
pip install .
Or for development:
pip install -e .
Requires Python 3.10+ and Pillow.
Usage
# Find near-duplicates with hamming distance threshold of 5
imgduptective 5
# Find exact duplicates only (identical file content)
imgduptective --exact
# Add files to the database without comparing
imgduptective --add
# Check what duplicates would be found if current directory were added
imgduptective --check 5
# Show per-directory statistics
imgduptective --stats 5
# Open the built-in viewer to inspect and delete duplicates
imgduptective --view 5
Options
| Flag | Description |
|---|---|
threshold |
Maximum hamming distance to consider a match (0 = identical perceptual hash) |
--view |
Open the tkinter viewer to browse and manage duplicate groups |
--stats |
Show per-directory duplicate statistics |
--check |
Preview what duplicates would be found without modifying the database |
--add |
Scan and hash files into the database without comparing |
--photos |
Only process common photo formats (jpg, png, heic, webp, tiff, bmp, gif) |
--exact |
Find exact file matches (same content) instead of perceptually similar |
--no-scan |
Skip file scanning/hashing entirely, use the database cache only |
--full-hash |
Use full-file SHA-1 instead of the default fast 64KB partial hash |
--project NAME |
Use a named project database (e.g., work, personal, holidays) |
--list-projects |
List available project databases with file counts |
Projects
Organize separate photo collections into named projects. Each project has its own database:
# Scan work photos
cd ~/Photos/Work
imgduptective --project work --add
# Scan holiday photos
cd ~/Photos/Holidays
imgduptective --project holidays --add
# Find duplicates within holidays
imgduptective --project holidays 5
# List all projects
imgduptective --list-projects
Without --project, the default database is used.
Performance
The tool uses several strategies to minimize scan time:
- Partial hashing (default): Only the first 64KB of each file is hashed (plus file size) for change detection. This is sufficient to distinguish different images while being 10-100x faster than full-file hashing on large files.
- Stat-based caching: On repeat scans, files whose size and modification time haven't changed skip hashing entirely (a single
stat()call per file). --no-scan: For re-running comparisons with different thresholds without any file I/O.--full-hash: Forces full SHA-1 of entire file contents when exact integrity verification is needed.- Multiprocessing: File hashing, image hash computation, and pair comparison all run in parallel.
Viewer
The built-in tkinter viewer (--view) displays duplicate groups side by side:
- ←/→ or n/p/space: Navigate between groups
- Click: Select/deselect images for deletion
- d or Delete: Compress selected files with gzip and remove originals
- q or Escape: Quit
Database
Hashes are stored in ~/.config/imgduptective/:
imgduptective.db— default projectimgduptective-{name}.db— named projects
The database has two tables:
- HashValueTable: Content-addressed cache mapping file hashes to image perceptual hashes
- FileTable: Maps file paths to their file hash, image hash, size, and modification time
Files that no longer exist are automatically pruned from the database on each scan.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file imgduptective-0.2.0.tar.gz.
File metadata
- Download URL: imgduptective-0.2.0.tar.gz
- Upload date:
- Size: 22.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1ed50f1d93395d0caec936ab3d3d43a6359d274a3deda765a33fcef02c82114
|
|
| MD5 |
6e8d84bc4e38e5a2fa55744575e4dab7
|
|
| BLAKE2b-256 |
6fc491b773da2fefdd274d5bbcc0355c53d08c6bc8f25472e14442506f9701be
|
File details
Details for the file imgduptective-0.2.0-py3-none-any.whl.
File metadata
- Download URL: imgduptective-0.2.0-py3-none-any.whl
- Upload date:
- Size: 10.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
77220e60b1eee35c484d966deb48a07e5d3b861626e1ddbea75f96615a660d7c
|
|
| MD5 |
b7ce75af57e5d56ff6db7f6a7f8934c1
|
|
| BLAKE2b-256 |
1695358e1f16e000ebd339924968eb7ebaeab7a4a5f52d836ef9cc4e6d2c4b19
|