Skip to main content

Cross-platform duplicate file finder and cleaner

Project description

dupegun 🔫

A fast, cross-platform command-line tool to find and eliminate duplicate files.

Python PyPI License Platform


What is dupegun?

dupegun scans your folders, detects duplicate files using a fast 3-pass hashing engine, and lets you delete, move, or hard-link them — all from your terminal. It works on every file type and every major operating system.

No GUI. No bloat. Just fast, safe, and simple.


Install

# Core install
pip install dupegun

# With TUI browser
pip install "dupegun[tui]"

# With watch mode
pip install "dupegun[watch]"

# Everything
pip install "dupegun[all]"

Requires Python 3.9 or higher.


Quick Start

# Scan a folder and see all duplicates
dupegun scan ~/Downloads

# Interactive TUI browser
dupegun tui ~/Downloads

# Monitor a folder for new duplicates
dupegun watch ~/Downloads

# Preview what would be deleted (nothing actually deleted)
dupegun delete ~/Downloads --strategy newest

# Actually delete duplicates
dupegun delete ~/Downloads --strategy newest --no-dry-run

# Run a system and configuration check
dupegun doctor

Commands

scan — find duplicates

dupegun scan <path> [options]
# Scan a single folder
dupegun scan ~/Downloads

# Scan multiple folders at once
dupegun scan ~/Downloads ~/Documents ~/Desktop

# Skip files smaller than 1 MB
dupegun scan ~/Downloads --min-size 1MB

# Scan specific size ranges (e.g., between 1MB and 100MB)
dupegun scan ~/Downloads --min-size 1MB --max-size 100MB

# Regex filename filter (e.g., files starting with "Copy of")
dupegun scan ~/Downloads --pattern "Copy of.*"

# Limit how deep the scan goes (e.g., 2 levels deep)
dupegun scan ~/Downloads --depth 2

# Speed up scans on SSDs using parallel hashing threads
dupegun scan ~/Downloads --workers 4

# Only scan image files
dupegun scan ~/Downloads --type .jpg --type .png --type .gif

# Skip specific folders
dupegun scan C:\ --exclude Windows --exclude "Program Files"

# Sort results by wasted space, size, count, or path
dupegun scan ~/Downloads --sort wasted

# Clean output for scripts (disable colors)
dupegun scan ~/Downloads --no-color

# Export results
dupegun scan ~/Downloads --json results.json
dupegun scan ~/Downloads --csv results.csv
dupegun scan ~/Downloads --html report.html

### `serve` — local web UI

Start a local web UI for dupegun in your browser. All processing happens on your machine.

# Requires: pip install "dupegun[serve]"
dupegun serve
dupegun serve ~/Downloads
dupegun serve ~/Downloads --port 8080
dupegun serve --host 0.0.0.0   # share on your local Wi-Fi network

image-dupes  find visually similar images
Find duplicate images even if their resolutions, compression, or exact bytes differ using perceptual hashing.


# Requires: pip install "dupegun[images]"
dupegun image-dupes ~/Photos
dupegun image-dupes ~/Photos --threshold 5   # stricter matching
dupegun image-dupes ~/Photos --algo dhash    # use difference hashing
schedule  automate background scans
Run automatic duplicate scans on a repeating schedule.
Bash# Requires: pip install "dupegun[schedule]"
dupegun schedule ~/Downloads --every day --at 02:00
dupegun schedule ~/Downloads --every 30min
dupegun schedule ~/Downloads --every week --on monday --at 09:00
dupegun schedule ~/Downloads --every day --action delete --log auto.log

tui — interactive terminal browser

Browse duplicates visually, mark files for deletion, and act — all from your keyboard.

# Requires: pip install "dupegun[tui]"
dupegun tui ~/Downloads
dupegun tui ~/Downloads --strategy newest
dupegun tui ~/Downloads --no-dry-run   # actually delete (default is dry-run)

Controls:

Key Action
Arrow keys / j k Navigate files
Space Toggle file for deletion
A Mark all duplicates in group (keep first)
U Unmark all in group
D Delete all marked files in current group
S Skip to next group
Q / Escape Quit

watch — monitor for new duplicates

Watch a folder in real time and get an alert whenever a duplicate file appears.

# Requires: pip install "dupegun[watch]"
dupegun watch ~/Downloads

Press Ctrl+C to stop watching.


delete — remove duplicates

dupegun delete <path> [options]
# Preview (dry-run is ON by default — nothing deleted)
dupegun delete ~/Downloads --strategy newest

# Actually delete
dupegun delete ~/Downloads --strategy newest --no-dry-run

# Auto-delete by age (only delete copies older than 30 days)
dupegun delete ~/Downloads --older-than 30 --no-dry-run

# Save a log of everything deleted
dupegun delete ~/Downloads --no-dry-run --log deleted.log

symlink & hardlink — save space

Replace duplicates with links pointing to a single kept file. Symlinks work across different drives, whereas hardlinks must be on the same drive.

dupegun symlink ~/Photos --strategy newest --no-dry-run
dupegun hardlink ~/Videos --no-dry-run

move — quarantine duplicates

dupegun move ~/Downloads --dest ~/quarantine --no-dry-run

Utilities: compare, stats, open, doctor, config, completion

  • compare: Find files that exist in both folders by content. Great for backups.
dupegun compare ~/Downloads ~/Backup
  • stats: Get a quick overview of total files, duplicate groups, and wasted space.
dupegun stats ~/Downloads
  • open: Open a file directly using your OS default application.
dupegun open ~/Downloads/duplicate_file.txt
  • doctor: Check system health, verify dependencies, and validate your config.
dupegun doctor
  • completion: Setup shell tab-completion for Bash, Zsh, or Fish.
dupegun completion
  • config: Create and manage your ~/.dupegun.toml defaults.
dupegun config --init

Plugin system

Extend dupegun with custom strategies without touching the core code.

Create a plugin file:

# my_plugin.py
from dupegun.plugins import register_strategy

@register_strategy("by_name")
def keep_alphabetically(paths):
    """Keep the file whose name comes first alphabetically."""
    return min(paths, key=lambda p: p.name.lower())

Use it:

dupegun delete ~/Downloads --plugin my_plugin.py --strategy by_name --no-dry-run

Options

Option Commands Description Default
--strategy <name> delete, move, symlink, hardlink, tui Which copy to keep: shortest, newest, oldest, or plugin shortest
--dry-run delete, move, symlink, hardlink, tui Preview without making any changes ON
--no-dry-run delete, move, symlink, hardlink, tui Actually perform the action
--older-than <days> delete Only delete copies modified more than this many days ago
--log <file> delete Append a TSV log of every deleted file
--plugin <file> delete, tui Load a plugin .py file (repeatable)
--config <file> all Path to a config TOML file ~/.dupegun.toml
--min-size <size> all Skip files smaller than this (e.g. 1MB) 1 byte
--max-size <size> all Skip files larger than this (e.g. 100MB) no limit
--pattern <regex> all Only scan filenames matching this regex none
--depth <n> all Maximum directory depth to scan unlimited
--workers <n> all Number of parallel hashing threads 1
--type <ext> all Only include files with this extension (repeatable) all types
--exclude <name> all Skip any folder with this name (repeatable) none
--sort <key> scan Sort output by size, count, path, or wasted
--summary scan Print total wasted space only, no file list OFF
--count scan Print group count and wasted space, then exit OFF
--no-color all Disable colored terminal output OFF
--json <file> scan Export scan results to JSON
--csv <file> scan Export scan results to CSV
--html <file> scan Export a self-contained HTML report

How it works

dupegun uses a 3-pass engine to detect duplicates accurately and efficiently:

Pass 1 — Group files by exact byte size
         (files with unique sizes are skipped immediately)

Pass 2 — Hash the first 4 KB of each size-match
         (quick pre-filter before full hashing)

Pass 3 — Full SHA-256 hash of remaining candidates
         (guaranteed accurate duplicate detection)


Supported file types

dupegun works on all file types — it compares raw file contents, not names or extensions.

Category Examples
Documents .pdf .docx .xlsx .txt
Images .jpg .png .gif .webp
Videos .mp4 .mkv .avi .mov
Audio .mp3 .wav .flac .aac
Archives .zip .rar .7z .tar .gz
Code .py .js .html .css .java

Platform support

Platform Supported
Windows Yes
Linux Yes
macOS Yes

Changelog

v3.0.0

  • serve command: A fully self-contained local web UI. Scan and manage duplicates from your browser.
  • image-dupes command: Detect visually similar images using perceptual hashing (imagehash), bypassing byte-level differences.
  • schedule command: Automate directory scanning and deletion using a background scheduler.
  • Network Drive Support: Improved path resolution to natively support scanning mapped network drives and UNC paths (NAS).

v2.3.0

  • open command: Quickly open a file directly using your OS default application.
  • --no-color flag: Disable colored terminal output for CI pipelines and logging.
  • Shell Completion: Use dupegun completion for Bash, Zsh, and Fish setup instructions.
  • doctor command: Instantly check system health, dependencies, and validate config files.

v2.2.0

  • symlink command: Replace duplicates with symbolic links.
  • Parallel Hashing: Added --workers <n> to utilize concurrent threads for a massive speed boost on SSDs.
  • Recursion Limits: Added --depth <n> flag to limit how deep a scan navigates.
  • Sorting: Added --sort flag to scan for ordering by size, count, path, or wasted space.

v2.1.0

  • Stability improvements and quick win optimizations for the core scanner.

v2.0.0

  • tui command: Interactive terminal UI — browse duplicate groups, mark files, delete with keyboard shortcuts.
  • watch command: Monitor a folder in real time and alert when a duplicate appears.
  • config command: Create and manage ~/.dupegun.toml to save your preferred defaults.
  • Plugin system: Register custom keep-strategies with @register_strategy("name").

v1.3.0

  • stats command: Show total files, size, duplicate groups, and wasted space percentage.
  • --html flag: Generate a self-contained HTML report.
  • --log flag: Append a TSV audit log of every deleted file.

v1.2.0

  • compare command: Compare two directories to find cross-duplicates.
  • --older-than flag: Auto-delete files based on age.
  • Size parsing supports human-readable formats (e.g., 1MB, 2GB).

v1.1.0

  • --type filter: Scan only specific file extensions.
  • --exclude filter: Skip folders by name.

v1.0.0

  • Initial release: scan, delete, move, hardlink commands.

Contributing

Pull requests are welcome! To get started:

git clone [https://github.com/Prasanna-Balakrishnan/dupegun.git](https://github.com/Prasanna-Balakrishnan/dupegun.git)
cd dupegun
pip install -e ".[all]"
pip install pytest
python -m pytest tests/ -v

License

MIT — see LICENSE for details.


Author

Made by Prasanna B

GitHub: https://github.com/Prasanna-Balakrishnan/dupegun

If this tool helped you, consider giving it a star on GitHub!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dupegun-3.0.0.tar.gz (56.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dupegun-3.0.0-py3-none-any.whl (46.4 kB view details)

Uploaded Python 3

File details

Details for the file dupegun-3.0.0.tar.gz.

File metadata

  • Download URL: dupegun-3.0.0.tar.gz
  • Upload date:
  • Size: 56.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for dupegun-3.0.0.tar.gz
Algorithm Hash digest
SHA256 91191980773ccbcf136b4d3627ae12b35c4ed39aeb62cefa79fdaf2a6dfcda6f
MD5 35867d4f5d7fef7ea40d64ab1deb4cb5
BLAKE2b-256 8f42a9acf5d3206a4d8e5f9e280b50c48fbcae2bbdb022ce1428a5522ae12572

See more details on using hashes here.

File details

Details for the file dupegun-3.0.0-py3-none-any.whl.

File metadata

  • Download URL: dupegun-3.0.0-py3-none-any.whl
  • Upload date:
  • Size: 46.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for dupegun-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fd1973676055b8396388ce94866d9c10f3bf7f3062c3b65953d980e9be94611d
MD5 a1ded30960752f1307a96bc25c5f3fbb
BLAKE2b-256 9c7bbe918e3202d91c61fb4ec97b7185375413c24347b8c523362145610f36db

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page