Skip to main content

Cross-platform duplicate file finder and cleaner

Project description

dupegun

A fast, cross-platform command-line tool to find and eliminate duplicate files.

Python PyPI License Platform


What is dupegun?

dupegun scans your folders, detects duplicate files using a fast 3-pass hashing engine, and lets you delete, move, symlink, or hard-link them — all from your terminal. It works on every file type and every major operating system.

No GUI. No bloat. Just fast, safe, and simple.


Install

# Core install
pip install dupegun

# With TUI browser
pip install "dupegun[tui]"

# With watch mode
pip install "dupegun[watch]"

# Everything
pip install "dupegun[all]"

Requires Python 3.9 or higher.


Quick Start

# Scan a folder and see all duplicates
dupegun scan ~/Downloads

# Scan faster on SSDs using parallel hashing (4 workers)
dupegun scan ~/Downloads --workers 4

# Interactive TUI browser
dupegun tui ~/Downloads

# Preview what would be deleted with a per-group breakdown
dupegun delete ~/Downloads --strategy newest

# Actually delete duplicates and save a log
dupegun delete ~/Downloads --strategy newest --no-dry-run --log deleted.log

# Undo those deletions
dupegun restore deleted.log --no-dry-run

Commands

scan — find duplicates

dupegun scan <path> [options]
# Scan a single folder
dupegun scan ~/Downloads

# Use parallel hashing for faster scans (huge speed boost on SSDs)
dupegun scan ~/Downloads --workers 4

# Limit how deep the scanner goes (0 = top folder only, 1 = one level deep)
dupegun scan ~/Downloads --depth 1

# Sort results by wasted space, size, count, or path
dupegun scan ~/Downloads --sort wasted

# Skip files smaller than 1 MB
dupegun scan ~/Downloads --min-size 1MB

# Regex filename filter
dupegun scan ~/Downloads --pattern "Copy of.*"

# Only scan image files
dupegun scan ~/Downloads --type .jpg --type .png --type .gif

# Just show total wasted space
dupegun scan ~/Downloads --summary

# Show count with colorized bar showing % of wasted space
dupegun scan ~/Downloads --count

# Export results
dupegun scan ~/Downloads --json results.json
dupegun scan ~/Downloads --csv results.csv
dupegun scan ~/Downloads --html report.html

restore — undo deletions

Reverse a previous dupegun delete --log run by copying the kept file back to where each deleted duplicate used to live.

dupegun restore <log_file> [options]
# Preview what would be restored (dry-run ON by default)
dupegun restore deleted.log

# Actually restore
dupegun restore deleted.log --no-dry-run

The restore command reads the TSV log file, shows you exactly what it will do in a table, then copies the keeper to each deleted path.


tui — interactive terminal browser

# Requires: pip install "dupegun[tui]"
dupegun tui ~/Downloads
dupegun tui ~/Downloads --strategy newest
dupegun tui ~/Downloads --no-dry-run

Controls:

Key Action
Space Toggle file for deletion
A Mark all duplicates in group
U Unmark all in group
D Delete all marked files
S Skip to next group
Q / Escape Quit

watch — monitor for new duplicates

# Requires: pip install "dupegun[watch]"
dupegun watch ~/Downloads
dupegun watch ~/Photos --type .jpg --type .png

Press Ctrl+C to stop.


delete — remove duplicates

# Preview with detailed per-group breakdown (dry-run ON by default)
dupegun delete ~/Downloads --strategy newest

# Actually delete
dupegun delete ~/Downloads --strategy newest --no-dry-run

# Delete with log so you can restore later
dupegun delete ~/Downloads --no-dry-run --log deleted.log

symlink — save space across drives

Replace duplicates with symbolic links pointing to the single kept file. Unlike hardlinks, symlinks work across different drives and filesystems.

# Preview symlink creation
dupegun symlink ~/Photos --strategy newest

# Actually create symlinks
dupegun symlink ~/Photos --strategy newest --no-dry-run

(Note: On Windows, creating symlinks may require Developer Mode or running as Administrator).


hardlink — save space on the same drive

Replace duplicates with hard links so multiple paths point to the exact same data blocks on your drive.

dupegun hardlink ~/Photos --no-dry-run

move — quarantine duplicates

dupegun move ~/Downloads --dest ~/quarantine --no-dry-run

config — manage your config file

dupegun config --init    # create ~/.dupegun.toml
dupegun config --show    # print current config
dupegun config --path    # show config file path

stats — folder statistics

dupegun stats ~/Downloads

compare — cross-directory duplicates

dupegun compare ~/Downloads ~/Backup

Plugin system

# my_plugin.py
from dupegun.plugins import register_strategy

@register_strategy("by_name")
def keep_alphabetically(paths):
    return min(paths, key=lambda p: p.name.lower())
dupegun delete ~/Downloads --plugin my_plugin.py --strategy by_name --no-dry-run

Options

Option Commands Description Default
--workers <n> all Number of parallel hashing threads (faster on SSDs) 1
--depth <n> all Max directory depth to scan (0 = top folder only) no limit
--sort <key> scan Sort output by size, count, path, or wasted scan order
--strategy <name> delete, move, symlink, hardlink, tui shortest, newest, oldest, or plugin name shortest
--dry-run delete, move, symlink, hardlink, tui Preview changes without modifying disk ON
--no-dry-run delete, move, symlink, hardlink, tui Actually perform the action
--log <file> delete Append a TSV log of every deleted file
--older-than <days> delete Only delete copies older than N days
--interactive delete Confirm each group before acting OFF
--min-size <size> all Skip files smaller than this 1 byte
--max-size <size> all Skip files larger than this no limit
--pattern <regex> all Only scan filenames matching this regex none
--type <ext> all Only include files with this extension (repeatable) all
--exclude <name> all Skip folders with this name (repeatable) none

How it works

Pass 1 — Group files by exact byte size
Pass 2 — Hash the first 4 KB of each size-match
Pass 3 — Full SHA-256 hash of remaining candidates (Parallel if --workers > 1)

Safety

  • Dry-run is ON by default on every destructive command.
  • Dry-run shows a per-group breakdown of exactly what would be freed.
  • Use --log to keep an audit trail, and restore to undo if needed.
  • Use move or symlink instead of delete if you want a safety net.

Changelog

v2.2.0 (Power Features)

  • symlink command: Replace duplicates with symbolic links. Works across different drives and filesystems.
  • Parallel hashing: Massive speed boosts on SSDs. Use --workers <n> to hash multiple files concurrently.
  • Directory depth limit: Use --depth <n> to restrict how deep the scanner goes (0 = top folder only).
  • Sort scan output: Sort duplicate groups using --sort (size, count, path, or wasted).

v2.1.0 (Quick Wins)

  • restore command: Undo deletions recorded in a --log file.
  • Dry-run summary: delete dry-run now shows a per-group breakdown of space saved.
  • ETA progress bar: Scans now show estimated time remaining.
  • Colorized --count: Visual bar showing percentage of wasted disk space.

v2.0.0 (TUI & Watch)

  • tui command: Interactive terminal UI browser.
  • watch command: Monitor folders for new duplicates in real time.
  • Plugin system: Register custom keep-strategies.

v1.0.0 — v1.3.0

  • Initial releases, scan, delete, move, hardlink, stats, and compare.

Contributing

git clone [https://github.com/Prasanna-Balakrishnan/dupegun.git](https://github.com/Prasanna-Balakrishnan/dupegun.git)
cd dupegun
pip install -e ".[all]"
pip install pytest
python -m pytest tests/ -v

License

MIT — see LICENSE for details.


Author

Made by Prasanna B

GitHub: https://github.com/Prasanna-Balakrishnan/dupegun

If this tool helped you, consider giving it a star on GitHub!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dupegun-2.2.0.tar.gz (38.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dupegun-2.2.0-py3-none-any.whl (30.8 kB view details)

Uploaded Python 3

File details

Details for the file dupegun-2.2.0.tar.gz.

File metadata

  • Download URL: dupegun-2.2.0.tar.gz
  • Upload date:
  • Size: 38.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for dupegun-2.2.0.tar.gz
Algorithm Hash digest
SHA256 db6ae2b54f39ad0b56fb3d19861860aa23e8bdb4844361a24b5a3adc13c153ee
MD5 cbdaad6def9da2ec6cb4b61bb0843730
BLAKE2b-256 2237406463488826067ec0c5171568b0890d10cfed81ffef27674bf8ce8306fb

See more details on using hashes here.

File details

Details for the file dupegun-2.2.0-py3-none-any.whl.

File metadata

  • Download URL: dupegun-2.2.0-py3-none-any.whl
  • Upload date:
  • Size: 30.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for dupegun-2.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 379e6e11e4577b2a71e217271334ad9e154a451570d350d1351ff7a855af4bb1
MD5 f1b07831cea214fa53b64167e33e8ca4
BLAKE2b-256 b6d1883c761e7c0acd0701a31f22c79e58c4c3a57e2f1ecbe71c1c838ae28db6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page