Compare two drives and find duplicate files. Zero dependencies, cross-platform, with undo.
Project description
diskcomp
Find and safely delete duplicate files — across two drives or within one. Zero dependencies, cross-platform, with undo.
Quick Install
Download binary (no Python required):
macOS:
# Homebrew
brew tap w1lkns/diskcomp
brew install diskcomp
# Or download directly
curl -L -o diskcomp https://github.com/w1lkns/diskcomp/releases/latest/download/diskcomp-macos
chmod +x diskcomp
./diskcomp --help
Linux:
# Download directly
curl -L -o diskcomp https://github.com/w1lkns/diskcomp/releases/latest/download/diskcomp-linux
chmod +x diskcomp
./diskcomp --help
Windows:
# Download diskcomp-windows.exe from GitHub Releases
# https://github.com/w1lkns/diskcomp/releases/latest
diskcomp-windows.exe --help
Python install (if you have Python):
pipx (recommended — handles PATH automatically):
pipx install diskcomp
diskcomp --help
Don't have pipx?
brew install pipxon macOS,pip install pipxelsewhere.
pip install:
pip install diskcomp
diskcomp --help
Single-file version (no install, no dependencies):
curl -O https://raw.githubusercontent.com/w1lkns/diskcomp/main/diskcomp.py
python3 diskcomp.py --help
Quick Start
Interactive mode (no arguments — clears screen, shows menu):
diskcomp
The launch menu offers:
1) Compare two drives
2) Clean up a single drive
3) Load previous report
4) Help
5) Quit
Compare two drives (command-line):
diskcomp --keep /Volumes/backup --other /Volumes/external
Clean up a single drive (find internal duplicates):
diskcomp --single /Volumes/my-drive
Dry-run (count files without hashing):
diskcomp --keep /path/A --other /path/B --dry-run
Load a previous report (skip re-scanning):
diskcomp --delete-from ./diskcomp-report-20260322-235800.csv
Usage & Flags
| Flag | Description | Example |
|---|---|---|
--keep PATH |
Path to the "keep" drive (files to retain). Required unless interactive. | --keep /Volumes/backup |
--other PATH |
Path to the "other" drive (duplicates deleted from here). Required unless interactive. | --other /Volumes/external |
--single PATH |
Scan one drive for internal duplicates (redundant copies on the same drive). | --single /Volumes/photos |
--dry-run |
Walk and count files without hashing (quick preview). | --dry-run |
--limit N |
Hash only first N files per drive (testing only). | --limit 100 |
--output PATH |
Custom report path (default: ~/diskcomp-report-YYYYMMDD-HHMMSS.csv). |
--output ./my-report.csv |
--format csv|json |
Report format: csv or json (default: csv). |
--format json |
--min-size SIZE |
Minimum file size to include (default: 1KB). Accepts bytes, KB, MB, GB. |
--min-size 10MB |
--delete-from PATH |
Load an existing report and start deletion workflow (skip re-scanning). | --delete-from ./diskcomp-report-20260322.csv |
--undo PATH |
View the audit log of a previous deletion session. | --undo ./diskcomp-undo-20260322.json |
How It Works
-
Drive Health Checks (pre-scan, two-drive mode):
- Space summary for both drives
- Filesystem detection (HFS+, NTFS, ext4, exFAT, etc.)
- Read-only detection (warns if "keep" drive is read-only)
- Read speed benchmark (128MB)
- Optional SMART data (if
smartmontoolsavailable)
-
Scanning & Hashing:
- Walks drives recursively
- Skips OS noise (
.DS_Store,Thumbs.db,System Volume Information, etc.) - Two-pass optimization: size-filter candidates first, then SHA256 hash
- Live progress bar with speed and ETA
-
Reporting:
- CSV or JSON report saved to
~/diskcomp-report-YYYYMMDD-HHMMSS.{csv,json} - Atomic writes (temp → rename, safe against crashes mid-write)
- CSV or JSON report saved to
-
Deletion Workflow (optional):
- Mode A (Interactive): Shows both copies numbered
(1)and(2)— you pick which to delete, skip, or abort. Running space freed shown after each deletion. - Mode B (Batch): Dry-run preview with file type breakdown → type
DELETEto confirm → progress bar - Undo log written before each deletion (audit-first pattern)
- Always abortable with
Ctrl+C - Can re-run from a saved report without re-scanning (option 3 in menu or
--delete-from)
- Mode A (Interactive): Shows both copies numbered
-
Undo Log (
--undoflag):- JSON file listing all deleted files with paths, sizes, hashes, and timestamps
- Deletion is permanent — the log is an audit trail, not a restore mechanism
Safety Model
The user is always in control. diskcomp prioritizes safety over convenience:
- No automatic deletion — every destructive action requires explicit confirmation
- Undo log first — log written before any file is deleted
- Read-only detection — warns if a drive appears read-only and skips it for deletion
- Dry-run mode — preview all operations without side effects
- Abortable — press
Ctrl+Cat any prompt to stop safely
Reports
CSV format (default, spreadsheet-friendly):
status,original_file,duplicate_file,size_mb,verification_hash
DELETE_FROM_OTHER,/Volumes/keep/photos/pic1.jpg,/Volumes/other/photos/pic1.jpg,2.5,abc123...
UNIQUE_IN_KEEP,/Volumes/keep/docs/resume.pdf,,0.1,def456...
UNIQUE_IN_OTHER,,/Volumes/other/temp/junk.tmp,5.0,ghi789...
| Column | Values |
|---|---|
status |
DELETE_FROM_OTHER, UNIQUE_IN_KEEP, UNIQUE_IN_OTHER |
original_file |
Path to the copy to keep |
duplicate_file |
Path to the copy to delete |
size_mb |
File size in MB |
verification_hash |
SHA256 hex string |
JSON format (programmatic use):
diskcomp --keep /Volumes/keep --other /Volumes/other --format json
Known Limitations
NTFS Drives on macOS and Linux
NTFS (Windows filesystem) drives are read-only on macOS and Linux by default:
- diskcomp can scan and identify duplicates on NTFS drives
- diskcomp cannot delete files from NTFS drives without a third-party driver
Workaround:
- macOS: ntfs-3g with macFUSE or Tuxera NTFS
- Linux:
sudo apt install ntfs-3g(Debian/Ubuntu) orsudo dnf install ntfs-3g(Fedora)
diskcomp detects this and warns during health checks.
Optional Enhancements
Rich library — professional progress bars and color styling:
pip install diskcomp[rich]
smartmontools — enables SMART data display:
- macOS:
brew install smartmontools - Linux:
apt-get install smartmontoolsorpacman -S smartmontools - Windows:
wmic logicaldisk(built-in, no install needed)
Without these, diskcomp uses ANSI progress bars and skips SMART data.
Cross-Platform Testing
CI validates diskcomp on 9 combinations:
- macOS (latest) × Python 3.8, 3.10, 3.12
- Linux (Ubuntu latest) × Python 3.8, 3.10, 3.12
- Windows (latest) × Python 3.8, 3.10, 3.12
All tests pass and the single-file build is verified on each combination.
Development
Run tests locally:
python -m pytest tests/
Generate single-file version:
python build_single.py
python diskcomp.py --help
License
MIT — See LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file diskcomp-1.0.0.tar.gz.
File metadata
- Download URL: diskcomp-1.0.0.tar.gz
- Upload date:
- Size: 606.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d7ad72982a671bd4058f7dbd378ed1d73f99420ed00964c78d356fbaed27598c
|
|
| MD5 |
6ee7cd4dfb3e1202fe5980e115716878
|
|
| BLAKE2b-256 |
2000bc842473327d558f62583147487bab178088853940187cf25cc29e0ecc9c
|
File details
Details for the file diskcomp-1.0.0-py3-none-any.whl.
File metadata
- Download URL: diskcomp-1.0.0-py3-none-any.whl
- Upload date:
- Size: 52.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f2cf48f2e32458ff790dec4e5d057669aced7fa5a9697002faa2edbfc1a0974
|
|
| MD5 |
1c334c192bcf0aef61b0671dc9498e4d
|
|
| BLAKE2b-256 |
6fe7418e09c922d900cf82b4ebe7ab5cc54fa9cdf8a825ac3de0a6050592710f
|