Skip to main content

Cross-platform duplicate file finder and cleaner

Project description

dupegun

A fast, cross-platform command-line tool to find and eliminate duplicate files.

Python PyPI License Platform


What is dupegun?

dupegun scans your folders, detects duplicate files using a fast 3-pass hashing engine, and lets you delete, move, or hard-link them — all from your terminal. It works on every file type and every major operating system.

No GUI. No bloat. Just fast, safe, and simple.


Install

pip install dupegun

Requires Python 3.9 or higher.


Quick Start

# Scan a folder and see all duplicates
dupegun scan ~/Downloads

# Preview what would be deleted (nothing actually deleted)
dupegun delete ~/Downloads --strategy newest

# Actually delete duplicates
dupegun delete ~/Downloads --strategy newest --no-dry-run

Commands

scan — find duplicates

dupegun scan <path> [options]
# Scan a single folder
dupegun scan ~/Downloads

# Scan multiple folders at once
dupegun scan ~/Downloads ~/Documents ~/Desktop

# Skip files smaller than 1 MB
dupegun scan ~/Downloads --min-size 1000000

# Scan specific size ranges (e.g., between 1MB and 100MB)
dupegun scan ~/Downloads --min-size 1MB --max-size 100MB

# Regex filename filter (e.g., files starting with "Copy of")
dupegun scan ~/Downloads --pattern "Copy of.*"

# Only scan image files
dupegun scan ~/Downloads --type .jpg --type .png --type .gif

# Only scan video files
dupegun scan ~/Downloads --type .mp4 --type .mkv --type .avi

# Skip specific folders
dupegun scan C:\ --exclude Windows --exclude "Program Files"

# Just show total wasted space — no file list
dupegun scan ~/Downloads --summary

# Show only the duplicate count and wasted space
dupegun scan ~/Downloads --count

# Export results to JSON
dupegun scan ~/Downloads --json results.json

# Export results to CSV
dupegun scan ~/Downloads --csv results.csv

# Combine filters — only duplicate JPEGs, skip cache folders
dupegun scan ~/Photos --type .jpg --exclude .thumbnails --summary

compare — cross-directory duplicates

Find files that exist in both folders. Great for checking backups against your active drive.

dupegun compare <path_a> <path_b> [options]
# Find files duplicated between your Downloads and your Backup drive
dupegun compare ~/Downloads ~/Backup

delete — remove duplicates

dupegun delete <path> [options]
# Preview (dry-run is ON by default — nothing deleted)
dupegun delete ~/Downloads --strategy newest

# Actually delete
dupegun delete ~/Downloads --strategy newest --no-dry-run

# Confirm each group one by one before deleting
dupegun delete ~/Downloads --no-dry-run --interactive

# Auto-delete by age (only delete copies older than 30 days)
dupegun delete ~/Downloads --older-than 30 --no-dry-run

# Delete only duplicate images, skip cache
dupegun delete ~/Photos --type .jpg --type .png --exclude .thumbnails --no-dry-run

move — quarantine duplicates

dupegun move <path> --dest <quarantine-folder> [options]
# Preview
dupegun move ~/Downloads --dest ~/quarantine

# Actually move
dupegun move ~/Downloads --dest ~/quarantine --no-dry-run

Keeps the original copy in place. Moves all duplicates to the destination folder for you to review manually.


hardlink — save space, keep all paths

dupegun hardlink <path> [options]
# Preview
dupegun hardlink ~/Photos

# Actually hardlink
dupegun hardlink ~/Photos --no-dry-run

Replaces duplicate files with hard links. Both file paths remain on your system, but they share the same physical disk space — no data is lost.


Options

Option Commands Description Default
--strategy shortest delete, move, hardlink Keep the file with the shortest path Default
--strategy newest delete, move, hardlink Keep the most recently modified copy
--strategy oldest delete, move, hardlink Keep the oldest copy
--dry-run delete, move, hardlink Preview without making any changes ON
--no-dry-run delete, move, hardlink Actually perform the action
--interactive delete Confirm each duplicate group before acting OFF
--older-than <days> delete Only delete copies modified more than this many days ago
--min-size <size> all Skip files smaller than this size (e.g. 1MB) 1 byte
--max-size <size> all Skip files larger than this size (e.g. 100MB) no limit
--pattern <regex> all Only scan filenames matching this regex none
--type <ext> all Only include files with this extension (repeatable) all types
--exclude <name> all Skip any folder with this name (repeatable) none
--summary scan Print total wasted space only, no file list OFF
--count scan Print group count and wasted space, then exit OFF
--json <file> scan Export scan results to JSON
--csv <file> scan Export scan results to CSV

How it works

dupegun uses a 3-pass engine to detect duplicates accurately and efficiently:

Pass 1 — Group files by exact byte size
         (files with unique sizes are skipped immediately)

Pass 2 — Hash the first 4 KB of each size-match
         (quick pre-filter before full hashing)

Pass 3 — Full SHA-256 hash of remaining candidates
         (guaranteed accurate duplicate detection)

This approach is significantly faster than hashing every file — large folders with thousands of files are handled quickly.


Supported file types

dupegun works on all file types — it compares raw file contents, not names or extensions.

Category Examples
Documents .pdf .docx .xlsx .pptx .txt
Images .jpg .png .gif .bmp .webp
Videos .mp4 .mkv .avi .mov
Audio .mp3 .wav .flac .aac
Archives .zip .rar .7z .tar .gz
Code .py .js .html .css .java
Everything else Any file, any extension

Two files with different names but identical contents will always be detected.


Safety

  • Dry-run is ON by default on every destructive command (delete, move, hardlink). You always see a preview first.
  • Use --no-dry-run only when you are sure.
  • Use --interactive to confirm each group one by one.
  • Use move instead of delete if you want a safety net.

Platform support

Platform Supported
Windows Yes
Linux Yes
macOS Yes

Examples

# Find duplicate images in Downloads
dupegun scan ~/Downloads --type .jpg --type .png --type .gif

# Find duplicate videos, skip system folders
dupegun scan C:\ --type .mp4 --type .mkv --exclude Windows --exclude "Program Files"

# Quick summary — how much space am I wasting?
dupegun scan ~/Downloads --summary

# Just the count
dupegun scan ~/Downloads --count
# 47 duplicate group(s) found, 2.3 GB wasted

# Find duplicates in Downloads, skip files under 500 KB
dupegun scan C:\Users\You\Downloads --min-size 500000

# Delete duplicate images keeping the newest copy
dupegun delete ~/Photos --type .jpg --type .png --strategy newest --no-dry-run

# Move duplicates from two folders into one quarantine folder
dupegun move C:\Photos C:\Backup --dest C:\quarantine --no-dry-run

# Export full report to CSV and open in Excel
dupegun scan C:\Users\You\Documents --csv report.csv

# Compare active projects to a backup drive to find cross-duplicates
dupegun compare ~/Projects /Volumes/BackupDrive/Projects

# Find duplicates matching a specific filename pattern and size range
dupegun scan ~/Downloads --pattern "Copy of.*" --min-size 1MB --max-size 50MB

Changelog

v1.2.0

  • compare command: Compare two directories to find cross-duplicates.
  • --older-than flag: Auto-delete files based on age (safeguards recent files).
  • --max-size flag: Combine with --min-size to scan specific size ranges.
  • --pattern flag: Filter scans by regex filename patterns.
  • Size parsing now supports human-readable formats (e.g., 1MB, 2GB).

v1.1.0

  • --type filter: scan only specific file extensions (e.g. --type .jpg --type .png)
  • --exclude filter: skip folders by name (e.g. --exclude node_modules)
  • --summary flag: show total wasted space without printing every file
  • --count flag: print duplicate group count and wasted space in one line
  • --type and --exclude are available on all commands (scan, delete, move, hardlink)

v1.0.1

  • Minor packaging fixes

v1.0.0

  • Initial release: scan, delete, move, hardlink commands
  • 3-pass hashing engine
  • --strategy, --dry-run, --interactive, --min-size, --json, --csv

Contributing

Pull requests are welcome! To get started:

git clone [https://github.com/YOUR_USERNAME/dupegun.git](https://github.com/YOUR_USERNAME/dupegun.git)
cd dupegun
pip install -e .
pip install pytest
pytest tests/

License

MIT — see LICENSE for details.


Author

Made by Prasanna B

If this tool helped you, consider giving it a star on GitHub!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dupegun-1.2.0.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dupegun-1.2.0-py3-none-any.whl (14.5 kB view details)

Uploaded Python 3

File details

Details for the file dupegun-1.2.0.tar.gz.

File metadata

  • Download URL: dupegun-1.2.0.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for dupegun-1.2.0.tar.gz
Algorithm Hash digest
SHA256 023f87c26edd92e74e98d75b4a13ba71ab17adff553dbe015241cc71225338b3
MD5 30a4cdde6dcd3aaac9e35869995dc57a
BLAKE2b-256 677badd8575d9f0da491e9c04af206b067d09f283fea8298faebcaf544282e48

See more details on using hashes here.

File details

Details for the file dupegun-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: dupegun-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 14.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for dupegun-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c27ee52faffeb2bd465cccda024ad67e91039e309b5f11a1d868854942fbc7fa
MD5 3e0676dedc5f2e5b567dd618bb3ed0eb
BLAKE2b-256 48af4b792b88b16bdd6c0b2417e12f4643bdd9aa2f805a4b199a27ea069f3e8c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page