Find, review, and safely trash duplicate and near-duplicate files
Project description
tdupes
Find, review, and safely trash duplicate files on Linux.
tdupes detects exact duplicates (byte-identical, via fdupes) and
optionally basename matches (same filename, scored by content similarity,
via plocate/locate). Results are written to a TSV that you review and
edit with your favourite spreadsheet tool before any files are touched;
confirmed deletions go to gio trash and remain recoverable until the bin
is emptied.
Key features:
- Accepts any mix of individual files and directories as arguments
- Basename-match detection with
-L: finds same-named files across the filesystem, scored by similarity (100exact ·XXXbinary same-size ·NNN%text match ·!!!binary different-size) - Preferred-directory protection — files inside protected dirs are never
proposed for deletion; on first run, all system dirs at
/(except/homeand/tmp) are pre-configured as preferred by default - Runtime preferred-dir control:
-padd,-rremove,-s/-Stoggle the full set of system dirs for a single run - Exclusion patterns via config or
-xat runtime - Interactive TSV review (opened with
xdg-open) or fully automated batch mode
Install
pip install tdupes
System dependencies (Ubuntu/Debian):
sudo apt install fdupes plocate gvfs-bin xdg-utils
Usage
tdupes [OPTIONS] PATH [PATH ...]
Positional arguments:
PATH Files or directories to scan for duplicates
Options:
-l, --locate Search locatedb for basename copies of each file arg
and add them to the scan (exact duplicates only)
-L, --locate-all Like -l, but also tabulate basename matches that are
not byte-identical, with similarity codes
-X, --delete-xxx (-L) DELETE non-preferred XXX matches
(binary, equal size, not identical)
-N, --delete-nnn (-L) DELETE non-preferred NNN matches
(text files, partial % similarity)
-Z, --delete-excl (-L) DELETE non-preferred !!! matches
(binary, different size)
-A, --heuristic-a (-L) Keep the largest and newest non-preferred file
in each basename group; delete the rest
-B, --heuristic-b (-L) Keep the shallowest-path non-preferred file(s)
in each basename group; delete the rest
-t FILE, --tsv FILE Path for the output TSV (default: temp file)
-p DIR, --prefer DIR Add DIR to preferred directories for this run.
Additive with config. Repeatable.
-r DIR, --remove-prefer DIR
Remove DIR from preferred directories for this run,
overriding config and -p. Repeatable.
-s, --system-prefer Add all top-level system dirs at / (except /home and
/tmp) to preferred dirs for this run
-S, --no-system-prefer Remove the system dirs from preferred dirs for this run
-x PATTERN, --exclude PATTERN
Shell glob to exclude files by full path. Additive
with config. Repeatable: -x '*.tmp' -x '/mnt/*'
-b, --batch Batch mode: no prompts; execute DELETE actions immediately
-v, --verbose Increase output verbosity
-q, --quiet Reduce output verbosity
-c FILE, --config FILE Config file path (default: $XDG_CONFIG_HOME/tdupes.yml)
-V, --version Show version and exit
-h, --help Show this help message and exit
Examples
# Scan two directories interactively
tdupes ~/Pictures ~/Downloads
# Find exact-duplicate copies of a specific file via locatedb
tdupes -l ~/Downloads/photo.jpg ~/Pictures
# Find basename matches too; keep all by default (review in TSV)
tdupes -L ~/Downloads/photo.jpg ~/Pictures
# Delete binary matches automatically in addition to finding exact dupes
tdupes -L -X -Z ~/Downloads/photo.jpg ~/Pictures
# Keep only largest+newest per basename group; delete the rest
tdupes -L -A ~/Downloads/photo.jpg ~/Pictures
# Scan freely, ignoring preferred-directory protection from config
tdupes -S ~/Documents
# Batch mode (good for scripting / cron)
tdupes --batch ~/Documents
# Write the TSV to a specific path
tdupes -t /tmp/dupes.tsv ~/Music ~/Videos
Config
On first run tdupes creates $XDG_CONFIG_HOME/tdupes.yml (defaults to
~/.config/tdupes.yml) and pre-populates preferred_directories with all
top-level system directories at / (excluding /home and /tmp), so that
system files are protected immediately without any manual configuration:
preferred_directories: # pre-filled with system dirs on first run
- /bin
- /boot
- /etc
- /usr
# … etc.
verbosity: 1 # 0=quiet, 1=normal, 2=verbose
tsv_output: null # null = temp file each run
exclusion_patterns: [] # shell glob patterns to skip
batch_mode: false
preferred_directories — any file whose path begins with one of these
directories is marked keep regardless of group ordering, and cannot be
overridden by -X/-N/-Z flags.
Use -r DIR to temporarily remove a directory from the protected set for one
run; use -S to remove all system dirs at once; use -s to add them back if
you have removed them from config.
TSV format
Action Similarity Size_KB Modified Path Comment
keep 100 2048.0 2024-11-01T14:22:10 /home/user/Pictures/photo.jpg in preferred folder
DELETE 100 2048.0 2024-09-15T08:01:55 /home/user/Downloads/photo.jpg
| Column | Values |
|---|---|
| Action | keep or DELETE — edit freely before confirming |
| Similarity | 100 exact · XXX binary same size · NNN text % match · !!! binary diff size |
| Size_KB | File size in kilobytes |
| Modified | Last-modified timestamp (ISO 8601) |
| Path | Absolute file path |
| Comment | Reason for the keep decision — informational, ignored on re-read |
Groups are separated by blank lines. The CLI argument file is always listed
first in each group. Basename match groups (found with -L) are written in a
separate section after the exact-duplicate groups, preceded by a # comment line.
Default Action logic
Exact-duplicate groups (byte-identical per fdupes):
| Comment tag | Rule |
|---|---|
in preferred folder |
File is inside a preferred_directories path → keep |
last in group |
Last file in the group (tiebreaker) → keep |
| (no tag) | All other copies → DELETE |
CLI argument files are listed first in each group so they are never the last-in-group tiebreaker and therefore receive DELETE by default (unless they fall under a preferred folder rule).
Basename match groups (-L, same basename, not byte-identical):
| Comment tag | Rule |
|---|---|
in preferred folder |
File is inside a preferred_directories path → keep (always) |
| (no tag) | All other files → keep by default |
Use -X, -N, -Z to DELETE non-preferred files by similarity type:
| Flag | Similarity | Meaning |
|---|---|---|
-X |
XXX |
Binary files of equal size |
-N |
NNN |
Text files with partial % similarity |
-Z |
!!! |
Binary files of different size |
Use -A and -B to apply heuristics that auto-select which non-preferred files to keep:
| Flag | Heuristic |
|---|---|
-A |
Keep the largest and the newest non-preferred file in the group |
-B |
Keep the shallowest-path non-preferred file(s) (ties all kept) |
Flags -A and -B can be combined — the union of their keep sets survives; the rest are deleted.
Preferred-directory files are always kept regardless of any flag.
The Comment column is read-only — it is ignored when tdupes re-reads the TSV after you edit it.
Interactive flow
tdupesscans paths and prints the duplicate table.- The TSV is opened with
xdg-openfor manual review. - You edit
Actioncells (changeDELETE→keepor vice-versa), save, return. tdupesre-reads the TSV, displays the updated table, and asks for confirmation.- On confirmation, all
DELETEfiles are sent to the trash viagio trash. - A summary shows how many files were trashed and how much space was freed.
Files trashed with gio trash remain recoverable from the system trash until
the bin is emptied.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tdupes-0.4.0.tar.gz.
File metadata
- Download URL: tdupes-0.4.0.tar.gz
- Upload date:
- Size: 29.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
84f2a883bcb47f45873705a9b2ec0be415a77034152605bf1e1a6bb94e8cbd3a
|
|
| MD5 |
0397298a82e4987e0a4a1e6f1891381d
|
|
| BLAKE2b-256 |
9268c30ba5e7fa88a8fb7f5431a444280cb8bb2a14a423fd1130bae72e3926a0
|
Provenance
The following attestation bundles were made for tdupes-0.4.0.tar.gz:
Publisher:
publish.yml on sjjsy/tdupes
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tdupes-0.4.0.tar.gz -
Subject digest:
84f2a883bcb47f45873705a9b2ec0be415a77034152605bf1e1a6bb94e8cbd3a - Sigstore transparency entry: 1192212903
- Sigstore integration time:
-
Permalink:
sjjsy/tdupes@0f0130aacdc4422b6508ac961d3ced7061b7e826 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/sjjsy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0f0130aacdc4422b6508ac961d3ced7061b7e826 -
Trigger Event:
release
-
Statement type:
File details
Details for the file tdupes-0.4.0-py3-none-any.whl.
File metadata
- Download URL: tdupes-0.4.0-py3-none-any.whl
- Upload date:
- Size: 18.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea295210e03c5bf4b70c3710c0e022c009b78dc8e72f721775b0dae933096e7c
|
|
| MD5 |
44599d5499a3ba7ae87b9b560ba436d4
|
|
| BLAKE2b-256 |
1da22cffbc7b6148bda3d94c63bbc83971fc3ae2f7a26bce19a148063e8da48b
|
Provenance
The following attestation bundles were made for tdupes-0.4.0-py3-none-any.whl:
Publisher:
publish.yml on sjjsy/tdupes
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tdupes-0.4.0-py3-none-any.whl -
Subject digest:
ea295210e03c5bf4b70c3710c0e022c009b78dc8e72f721775b0dae933096e7c - Sigstore transparency entry: 1192212944
- Sigstore integration time:
-
Permalink:
sjjsy/tdupes@0f0130aacdc4422b6508ac961d3ced7061b7e826 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/sjjsy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0f0130aacdc4422b6508ac961d3ced7061b7e826 -
Trigger Event:
release
-
Statement type: