Duplicate file finder
Project description
Info
Dedup is a cross-platform command-line Python application that is designed to efficiently detect and report duplicate files on your system.
Requirements
- Linux (fully tested), Windows, MacOS
- Python > 3.6
- 3rd party module free ✅
Installation
git clone https://github.com/brightio/dedup
or
wget https://raw.githubusercontent.com/brightio/dedup/main/dedup.py
Modes
➤ Normal mode
It detects duplicate files in the given directories and/or files.
➤ Target mode
It detects if the given directories and/or files exist in the target directories and/or files which can be specified with -t.
File treatment
- Empty files are excluded.
- Symbolic links are not followed.
- Hard links are considered to be the same file.
- Hidden files and directories are excluded (they can be included with -a, -hf, -hd)
- The duplicate sets are sorted by the space that will be freed if the duplicate files are removed (use -s to sort by individual file size)
- The hashing algorithm to detect duplicate files is the SHA1. Further verification by typing 'v' in the interactive menu which will verify the results using MD5.
Item filtering
- Use -min and -max for minimum and maximum file size respectively. The size can be specified like 500K, 2M, 10G etc.
- Use -xf and -xd to exclude files and directories respectively. The value will be treated as a regular expression. Note: More elaborate filtering can be achieved via external programs such as 'find', as 'dedup' accepts newline separated item list from stdin.
Command line options
usage: dedup.py [-h] [-t TARGETS] [-s] [-u] [-S] [-I] [-V] [-xf EXCLUDE_FILES] [-xd EXCLUDE_DIRECTORIES] [-min MIN_SIZE] [-max MAX_SIZE] [-a] [-hf] [-hd] [-v]
[ITEMS ...]
This program detects duplicate files.
positional arguments:
ITEMS Files/Directories to detect duplicates
options:
-h, --help show this help message and exit
-t TARGETS, --targets TARGETS
Files/Directories that we want to check if the ITEMS exist in there
-s, --sort-size Sort duplicates by size (Default: Saving size)
-u, --show-unique Show also unique files (Default: No)
-S, --only-stats Show only statistics
-I, --non-interactive
Disable interactive prompts (Default: Enabled)
-V, --verbose Show files while they are being read
-xf EXCLUDE_FILES, --exclude-files EXCLUDE_FILES
Files to exclude (regex)
-xd EXCLUDE_DIRECTORIES, --exclude-directories EXCLUDE_DIRECTORIES
Directories to exclude (regex)
-min MIN_SIZE, --min-size MIN_SIZE
Ommit files smaller than SIZE (Bytes).
-max MAX_SIZE, --max-size MAX_SIZE
Ommit files larger than SIZE (Bytes).
-a, --include-hidden Include hidden files and directories (Default: No)
-hf, --include-hidden-files
Include hidden files (Default: No)
-hd, --include-hidden-directories
Include hidden directories (Default: No)
-v, --version Show version
TODO
- Improve duplicate detection performance and interactive menu navigation.
- Ability to save session, TAB delimited output and a file with the files to be deleted.
- Ability to look into archive/zipped files.
- Detect duplicate directories.
- Stop hashing candidate duplicate files if at some point their data are different. This will save time with large files (like disk images) where their sizes are the same but their data differ.
Known Issues
- Ctrl-C for stopping the program while searching for duplicates doesn't work on Windows yet.
- Exiting the program on MacOS produce a warning like:
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 9 leaked semaphore objects to clean up at shutdownwhich I can't solve yet.
Contribution
If you want to contribute to this project please report bugs, unexpected program behaviours and/or new ideas.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dedup_kit_ng-0.8.1.tar.gz.
File metadata
- Download URL: dedup_kit_ng-0.8.1.tar.gz
- Upload date:
- Size: 49.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7fc8cd90193086676370298cac6e2c4ee25065e2d3518a56da3b0dd5279107e4
|
|
| MD5 |
7b12fd109f822e5d93591a126fc079a6
|
|
| BLAKE2b-256 |
7c8f25328f0fc7971a87d452b9ab32f95c5aebc9603d5013363d019f1a21a7cc
|
File details
Details for the file dedup_kit_ng-0.8.1-py3-none-any.whl.
File metadata
- Download URL: dedup_kit_ng-0.8.1-py3-none-any.whl
- Upload date:
- Size: 36.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e31048b21af4eb17ca9005e4d14829ba6848e27e89f17e906789fc3a3a6c6d11
|
|
| MD5 |
d1745a1276f81699735dd4d94ec7a6c7
|
|
| BLAKE2b-256 |
d599f01c23ee8d98c673b8bcba8e0a8b451ac5f43df31770cc34bfc785cc3261
|