Skip to main content

Compare two drives and find duplicate files. Zero dependencies, cross-platform, with undo.

Project description

diskcomp

CI PyPI version Python versions License: MIT GitHub release Platform support Standalone binaries

Find and safely delete duplicate files โ€” across two drives or within one. Zero dependencies, cross-platform, with undo.

๐Ÿ“‹ View the Roadmap โ€” See what's planned for future versions

โœจ Key Features

  • ๐Ÿ” Smart Detection โ€” SHA256 hashing finds true duplicates regardless of filename
  • โšก Performance โ€” Two-pass scan: filter by size first, hash only size-collision candidates
  • ๐Ÿ›ก๏ธ Safety First โ€” Always ask before deleting, create undo logs, detect read-only files
  • ๐Ÿ–ฅ๏ธ Cross-Platform โ€” macOS, Linux, Windows with native progress bars (Rich UI + ANSI fallback)
  • ๐Ÿ“Š Rich Reports โ€” CSV/JSON output with file paths, sizes, hashes, and deletion recommendations
  • ๐ŸŽฏ Flexible Modes โ€” Compare two drives, clean single drive, interactive deletion, batch operations
  • โš™๏ธ Zero Dependencies โ€” Pure Python, optional Rich UI, works everywhere Python runs
  • ๐Ÿ“ฆ Multiple Install Options โ€” pip, pipx, standalone binaries (Homebrew coming in v1.1)

๐Ÿ“Š Project Status

diskcomp 1.0.0 is production-ready and actively maintained. The core deduplication engine has been tested with 285 comprehensive tests covering edge cases, cross-platform compatibility, and error handling.

  • โœ… Feature Complete โ€” All planned v1.0 features implemented
  • โœ… Well Tested โ€” 285 tests, CI on 3 platforms ร— 3 Python versions
  • โœ… Production Ready โ€” Used for real data cleanup with safety guarantees
  • โœ… Cross-Platform โ€” Native builds for macOS, Linux, Windows
  • โœ… Multiple Distribution Channels โ€” PyPI, GitHub Releases (Homebrew coming in v1.1)

Quick Install

Download binary (no Python required):

macOS:

# Direct download (recommended)
curl -L -o diskcomp https://github.com/w1lkns/diskcomp/releases/latest/download/diskcomp-macos
chmod +x diskcomp
./diskcomp --help

# Homebrew (coming in v1.1)
# brew tap w1lkns/diskcomp
# brew install diskcomp

Linux:

# Download directly  
curl -L -o diskcomp https://github.com/w1lkns/diskcomp/releases/latest/download/diskcomp-linux
chmod +x diskcomp
./diskcomp --help

Windows:

# Download diskcomp-windows.exe from GitHub Releases
# https://github.com/w1lkns/diskcomp/releases/latest
diskcomp-windows.exe --help

Python install (if you have Python):

pipx (recommended โ€” handles PATH automatically):

pipx install diskcomp
diskcomp --help

Don't have pipx? brew install pipx on macOS, pip install pipx elsewhere.

pip install:

pip install diskcomp
diskcomp --help

Single-file version (no install, no dependencies):

curl -O https://raw.githubusercontent.com/w1lkns/diskcomp/main/diskcomp.py
python3 diskcomp.py --help

Quick Start

Interactive mode (no arguments โ€” clears screen, shows menu):

diskcomp

The launch menu offers:

  1) Compare two drives
  2) Clean up a single drive
  3) Load previous report
  4) Help
  5) Quit

Compare two drives (command-line):

diskcomp --keep /Volumes/backup --other /Volumes/external

Clean up a single drive (find internal duplicates):

diskcomp --single /Volumes/my-drive

Dry-run (count files without hashing):

diskcomp --keep /path/A --other /path/B --dry-run

Load a previous report (skip re-scanning):

diskcomp --delete-from ./diskcomp-report-20260322-235800.csv

๐Ÿ“Š Example Output

Interactive mode startup:

 โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•—  โ–ˆโ–ˆโ•— โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ–ˆโ•—   โ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—
 โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ•โ•โ•โ•โ–ˆโ–ˆโ•‘ โ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•”โ•โ•โ•โ•โ•โ–ˆโ–ˆโ•”โ•โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—
 โ–ˆโ–ˆโ•‘  โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ• โ–ˆโ–ˆโ•‘     โ–ˆโ–ˆโ•‘   โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ–ˆโ–ˆโ–ˆโ–ˆโ•”โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•
 โ–ˆโ–ˆโ•‘  โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ•šโ•โ•โ•โ•โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•— โ–ˆโ–ˆโ•‘     โ–ˆโ–ˆโ•‘   โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ•šโ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ•โ•โ•
 โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘  โ–ˆโ–ˆโ•—โ•šโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ•šโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•‘ โ•šโ•โ• โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘
 โ•šโ•โ•โ•โ•โ•โ• โ•šโ•โ•โ•šโ•โ•โ•โ•โ•โ•โ•โ•šโ•โ•  โ•šโ•โ• โ•šโ•โ•โ•โ•โ•โ• โ•šโ•โ•โ•โ•โ•โ• โ•šโ•โ•     โ•šโ•โ•โ•šโ•โ•

 Find duplicates. Free space. Stay safe.
 v1.0.0

What would you like to do?
  1) Compare two drives
  2) Clean up a single drive  
  3) Load previous report
  4) Help
  5) Quit

Progress display:

Drive Health: Keep=/Volumes/Photos (2TB APFS), Other=/Volumes/Backup (4TB NTFS)
Scanning: โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 1,847 files found
Hashing candidates: โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 234/234 files (23.4 MB/s)

Found 42 duplicates. You could free 1.2 GB from /Volumes/Backup. Ready to review?

๐Ÿ›ก๏ธ Safety Guarantees

Interactive mode startup:

 โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•—  โ–ˆโ–ˆโ•— โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ–ˆโ•—   โ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—
 โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ•โ•โ•โ•โ–ˆโ–ˆโ•‘ โ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•”โ•โ•โ•โ•โ•โ–ˆโ–ˆโ•”โ•โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—
 โ–ˆโ–ˆโ•‘  โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ• โ–ˆโ–ˆโ•‘     โ–ˆโ–ˆโ•‘   โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ–ˆโ–ˆโ–ˆโ–ˆโ•”โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•
 โ–ˆโ–ˆโ•‘  โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ•šโ•โ•โ•โ•โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•— โ–ˆโ–ˆโ•‘     โ–ˆโ–ˆโ•‘   โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ•šโ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ•โ•โ•
 โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘  โ–ˆโ–ˆโ•—โ•šโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ•šโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•‘ โ•šโ•โ• โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘
 โ•šโ•โ•โ•โ•โ•โ• โ•šโ•โ•โ•šโ•โ•โ•โ•โ•โ•โ•โ•šโ•โ•  โ•šโ•โ• โ•šโ•โ•โ•โ•โ•โ• โ•šโ•โ•โ•โ•โ•โ• โ•šโ•โ•     โ•šโ•โ•โ•šโ•โ•

 Find duplicates. Free space. Stay safe.
 v1.0.0

What would you like to do?
  1) Compare two drives
  2) Clean up a single drive  
  3) Load previous report
  4) Help
  5) Quit

Progress display:

Drive Health: Keep=/Volumes/Photos (2TB APFS), Other=/Volumes/Backup (4TB NTFS)
Scanning: โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 1,847 files found
Hashing candidates: โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 234/234 files (23.4 MB/s)

Found 42 duplicates. You could free 1.2 GB from /Volumes/Backup. Ready to review?

Your files are safe. diskcomp prioritizes safety over convenience:

  • ๐Ÿ”’ No Automatic Deletion โ€” Every file deletion requires explicit user confirmation
  • ๐Ÿ“ Undo Logs โ€” Complete audit trail written before any file is deleted
  • โš ๏ธ Read-Only Detection โ€” Automatically detects and warns about read-only drives
  • ๐Ÿ” Dry-Run Mode โ€” Preview operations without any file system changes
  • โน๏ธ Abort Anytime โ€” Press Ctrl+C at any prompt to stop safely
  • โœจ Interactive Mode โ€” Review each file individually before deletion
  • ๐Ÿ” SHA256 Verification โ€” Cryptographic hashing ensures only true duplicates are identified

Usage & Flags

Flag Description Example
--keep PATH Path to the "keep" drive (files to retain). Required unless interactive. --keep /Volumes/backup
--other PATH Path to the "other" drive (duplicates deleted from here). Required unless interactive. --other /Volumes/external
--single PATH Scan one drive for internal duplicates (redundant copies on the same drive). --single /Volumes/photos
--dry-run Walk and count files without hashing (quick preview). --dry-run
--limit N Hash only first N files per drive (testing only). --limit 100
--output PATH Custom report path (default: ~/diskcomp-report-YYYYMMDD-HHMMSS.csv). --output ./my-report.csv
--format csv|json Report format: csv or json (default: csv). --format json
--min-size SIZE Minimum file size to include (default: 1KB). Accepts bytes, KB, MB, GB. --min-size 10MB
--delete-from PATH Load an existing report and start deletion workflow (skip re-scanning). --delete-from ./diskcomp-report-20260322.csv
--undo PATH View the audit log of a previous deletion session. --undo ./diskcomp-undo-20260322.json

How It Works

  1. Drive Health Checks (pre-scan, two-drive mode):

    • Space summary for both drives
    • Filesystem detection (HFS+, NTFS, ext4, exFAT, etc.)
    • Read-only detection (warns if "keep" drive is read-only)
    • Read speed benchmark (128MB)
    • Optional SMART data (if smartmontools available)
  2. Scanning & Hashing:

    • Walks drives recursively
    • Skips OS noise (.DS_Store, Thumbs.db, System Volume Information, etc.)
    • Two-pass optimization: size-filter candidates first, then SHA256 hash
    • Live progress bar with speed and ETA
  3. Reporting:

    • CSV or JSON report saved to ~/diskcomp-report-YYYYMMDD-HHMMSS.{csv,json}
    • Atomic writes (temp โ†’ rename, safe against crashes mid-write)
  4. Deletion Workflow (optional):

    • Mode A (Interactive): Shows both copies numbered (1) and (2) โ€” you pick which to delete, skip, or abort. Running space freed shown after each deletion.
    • Mode B (Batch): Dry-run preview with file type breakdown โ†’ type DELETE to confirm โ†’ progress bar
    • Undo log written before each deletion (audit-first pattern)
    • Always abortable with Ctrl+C
    • Can re-run from a saved report without re-scanning (option 3 in menu or --delete-from)
  5. Undo Log (--undo flag):

    • JSON file listing all deleted files with paths, sizes, hashes, and timestamps
    • Deletion is permanent โ€” the log is an audit trail, not a restore mechanism

Reports

CSV format (default, spreadsheet-friendly):

status,original_file,duplicate_file,size_mb,verification_hash
DELETE_FROM_OTHER,/Volumes/keep/photos/pic1.jpg,/Volumes/other/photos/pic1.jpg,2.5,abc123...
UNIQUE_IN_KEEP,/Volumes/keep/docs/resume.pdf,,0.1,def456...
UNIQUE_IN_OTHER,,/Volumes/other/temp/junk.tmp,5.0,ghi789...
Column Values
status DELETE_FROM_OTHER, UNIQUE_IN_KEEP, UNIQUE_IN_OTHER
original_file Path to the copy to keep
duplicate_file Path to the copy to delete
size_mb File size in MB
verification_hash SHA256 hex string

JSON format (programmatic use):

diskcomp --keep /Volumes/keep --other /Volumes/other --format json

Known Limitations

NTFS Drives on macOS and Linux

NTFS (Windows filesystem) drives are read-only on macOS and Linux by default:

  • diskcomp can scan and identify duplicates on NTFS drives
  • diskcomp cannot delete files from NTFS drives without a third-party driver

Workaround:

diskcomp detects this and warns during health checks.

Optional Enhancements

Rich library โ€” professional progress bars and color styling:

pip install diskcomp[rich]

smartmontools โ€” enables SMART data display:

  • macOS: brew install smartmontools
  • Linux: apt-get install smartmontools or pacman -S smartmontools
  • Windows: wmic logicaldisk (built-in, no install needed)

Without these, diskcomp uses ANSI progress bars and skips SMART data.

Cross-Platform Testing

CI validates diskcomp on 9 combinations:

  • macOS (latest) ร— Python 3.8, 3.10, 3.12
  • Linux (Ubuntu latest) ร— Python 3.8, 3.10, 3.12
  • Windows (latest) ร— Python 3.8, 3.10, 3.12

All tests pass and the single-file build is verified on each combination.

Development

Run tests locally:

python -m pytest tests/

Generate single-file version:

python build_single.py
python diskcomp.py --help

๐Ÿค Support & Contributing

โญ Like diskcomp? Star it on GitHub to show support!

License

MIT โ€” See LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diskcomp-1.0.3.tar.gz (94.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

diskcomp-1.0.3-py3-none-any.whl (55.3 kB view details)

Uploaded Python 3

File details

Details for the file diskcomp-1.0.3.tar.gz.

File metadata

  • Download URL: diskcomp-1.0.3.tar.gz
  • Upload date:
  • Size: 94.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for diskcomp-1.0.3.tar.gz
Algorithm Hash digest
SHA256 41c5f27477bfd095275ec3db96ed4852f189afd5b267bdea784776cc46a3188e
MD5 9297dbdc6cb527fb72c00d9010e18cb2
BLAKE2b-256 9aefd7758d93ec6920db434cb114b1b0f00f8eed15dd96f2c2f1a62ba28db6f6

See more details on using hashes here.

File details

Details for the file diskcomp-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: diskcomp-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 55.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for diskcomp-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 42bf1b3f8c255e487ade16c64735f3cd36bf1d355dc224e28d0903a380088fbe
MD5 f8b0b72cc48962e36a0a10387dd22b58
BLAKE2b-256 26397dd03b4c5323f86ee0f31c10c817cf8a6814617e8f642c2536f9290518a2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page