Find, review, and safely trash duplicate and near-duplicate files
Project description
tdupes
Smartly find, review, and safely trash exact and near-duplicate files on Linux.
tdupes detects exact duplicates (byte-identical, via fdupes) and
optionally near-duplicates (same basename, scored by content similarity, via
plocate/locate). Results are written to a TSV that you review and edit with
your favourite spreadsheet tool before any files are touched; confirmed deletions
go to gio trash and remain recoverable until the bin is emptied.
Key features:
- Accepts any mix of individual files and directories as arguments
- Near-duplicate detection with
-L: for files given as arguments it finds same-basename files across the filesystem, with a similarity score (text %, binary same/different size) - Preferred-directory protection — files inside configured dirs are never proposed to be deleted by default
- Preferred dirs and exclusion patterns can be specified by config file
or via
-p/-xflags upon execution - Prepares a smart action plan to a TSV table and allows its interactive
editing with your favourite spreadsheet tool (TSV opened with
xdg-open) - Automated batch mode also available (the TSV serves then as a log)
Install
pip install tdupes
System dependencies (Ubuntu/Debian):
sudo apt install fdupes plocate gvfs-bin xdg-utils
Usage
tdupes [OPTIONS] PATH [PATH ...]
Positional arguments:
PATH Files or directories to scan for duplicates
Options:
-l, --locate Expand file arguments via locatedb (exact basename matches)
-L, --locate-all Like -l, but also tabulate near-duplicates (same basename,
not byte-identical) with real similarity codes
-t FILE, --tsv FILE
Path for the output TSV (default: temp file)
-p DIR, --prefer DIR
Mark DIR as preferred at runtime (files inside are never
proposed for deletion). Additive with config. Repeatable.
-x PATTERN, --exclude PATTERN
Shell glob to exclude files by full path. Additive with
config. Repeatable: -x '*.tmp' -x '/mnt/*'
-b, --batch Batch mode: no prompts; execute DELETE actions immediately
-v, --verbose Increase output verbosity
-q, --quiet Reduce output verbosity
-c, --config FILE Config file path (default: $XDG_CONFIG_HOME/tdupes.yml)
-V, --version Show version and exit
-h, --help Show this help message and exit
Examples
# Scan two directories interactively
tdupes ~/Pictures ~/Downloads
# Use locate to also find exact-duplicate copies of a specific file
tdupes --locate ~/Downloads/photo.jpg ~/Pictures
# Use locate and also include near-duplicates (same basename, different content)
tdupes -L ~/Downloads/photo.jpg ~/Pictures
# Batch mode (good for scripting / cron)
tdupes --batch ~/Documents
# Write the TSV to a specific path
tdupes -t /tmp/dupes.tsv ~/Music ~/Videos
Config
On first run tdupes creates $XDG_CONFIG_HOME/tdupes.yml (defaults to
~/.config/tdupes.yml):
preferred_directories: [] # files here are never proposed to be deleted
verbosity: 1 # 0=quiet, 1=normal, 2=verbose
tsv_output: null # null = temp file each run
exclusion_patterns: [] # shell glob patterns to skip
batch_mode: false
preferred_directories — any file whose path begins with one of these
directories will be marked keep regardless of group ordering.
TSV format
Action Similarity Size_KB Modified Path Comment
keep 100 2048.0 2024-11-01T14:22:10 /home/user/Pictures/photo.jpg in preferred folder
DELETE 100 2048.0 2024-09-15T08:01:55 /home/user/Downloads/photo.jpg
| Column | Values |
|---|---|
| Action | keep or DELETE — edit freely before confirming |
| Similarity | 100 exact · XXX binary same size · NNN text % match · !!! binary diff size |
| Size_KB | File size in kilobytes |
| Modified | Last-modified timestamp (ISO 8601) |
| Path | Absolute file path |
| Comment | Reason for the proposed action (see below) — informational, ignored on re-read |
Groups are separated by blank lines. The first entry in each group is either the file given as a CLI argument, or the newest copy.
Near-duplicate groups (found with -L) are written in a separate section after
the exact-duplicate groups, preceded by a # comment line.
Default Action logic
Exact-duplicate groups (byte-identical per fdupes):
| Comment tag | Rule |
|---|---|
in preferred folder |
File is inside a preferred_directories path → keep |
last in group |
Last file in the group (tiebreaker) → keep |
| (no tag) | All other copies → DELETE |
CLI argument files are listed first in each group so they are never the last-in-group tiebreaker and therefore receive DELETE by default (unless they also fall under a preferred folder rule).
Near-duplicate groups (-L, same basename, not byte-identical):
| Comment tag | Rule |
|---|---|
in preferred folder |
File is inside a preferred_directories path → keep |
largest in basename group |
Overall largest file in the group, only if no preferred file is larger → keep |
newest in basename group |
Overall newest file in the group, only if no preferred file is newer → keep |
| (no tag) | Everything else → DELETE |
CLI argument files are listed first and may receive DELETE if they are neither the largest nor the newest (and not in a preferred folder).
If a preferred-folder file is already the overall largest (or newest), no extra non-preferred copy is kept for that reason — the preferred file already covers it.
Multiple tags are comma-separated (e.g. largest in basename group, newest in basename group).
The Comment column is read-only — it is ignored when tdupes re-reads the TSV after you edit it.
Interactive flow
tdupesscans paths and prints the duplicate table.- The TSV is opened with
xdg-openfor manual review. - You edit
Actioncells (changeDELETE→keepor vice-versa), save, return. tdupesre-reads the TSV and asks for confirmation.- On confirmation, all
DELETEfiles are sent to the trash viagio trash. - A summary shows how many files were trashed and how much space was freed.
Files trashed with gio trash remain recoverable from the system trash until
the bin is emptied.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tdupes-0.3.0.tar.gz.
File metadata
- Download URL: tdupes-0.3.0.tar.gz
- Upload date:
- Size: 25.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3fa3381c433542f9e50eaf2ccf1ecad8613dac0c9ea885478a7635f30361919b
|
|
| MD5 |
83c6e35e13dc9bb980474183b85865d8
|
|
| BLAKE2b-256 |
b4ea1386ef8934a173ed1ef5f5cc7cd0fd54cfd181420987efc007a7759d0703
|
Provenance
The following attestation bundles were made for tdupes-0.3.0.tar.gz:
Publisher:
publish.yml on sjjsy/tdupes
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tdupes-0.3.0.tar.gz -
Subject digest:
3fa3381c433542f9e50eaf2ccf1ecad8613dac0c9ea885478a7635f30361919b - Sigstore transparency entry: 1191990665
- Sigstore integration time:
-
Permalink:
sjjsy/tdupes@98cc470eb97566fe6bc4ae1efd1512d69d637a76 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/sjjsy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@98cc470eb97566fe6bc4ae1efd1512d69d637a76 -
Trigger Event:
release
-
Statement type:
File details
Details for the file tdupes-0.3.0-py3-none-any.whl.
File metadata
- Download URL: tdupes-0.3.0-py3-none-any.whl
- Upload date:
- Size: 16.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bc6d9abc120468b883929f9476189936b7ce519a523dd9f4b04c0ed1ac90e6f9
|
|
| MD5 |
8ad2b22d354c02ff78ae1de52f25cca4
|
|
| BLAKE2b-256 |
9c0846be710f2e749ed5e7d13195a686b3b343420532cdab83c562f4818a663a
|
Provenance
The following attestation bundles were made for tdupes-0.3.0-py3-none-any.whl:
Publisher:
publish.yml on sjjsy/tdupes
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tdupes-0.3.0-py3-none-any.whl -
Subject digest:
bc6d9abc120468b883929f9476189936b7ce519a523dd9f4b04c0ed1ac90e6f9 - Sigstore transparency entry: 1191990667
- Sigstore integration time:
-
Permalink:
sjjsy/tdupes@98cc470eb97566fe6bc4ae1efd1512d69d637a76 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/sjjsy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@98cc470eb97566fe6bc4ae1efd1512d69d637a76 -
Trigger Event:
release
-
Statement type: