Find duplicate files by content (SHA-256) — zero install, cross-platform. pip install duphunt — no brew/apt. Zero dependencies.
Project description
duphunt
Find duplicate files by content — anywhere, with nothing to install. The
great duplicate finders (fdupes, jdupes, rdfind, fclones) are native
binaries you have to brew/apt/cargo install first — which you can't always
do on a locked-down box, a colleague's laptop, a CI runner, or a container.
duphunt runs the moment you have Python or Node: pip install duphunt or
npx duphunt .. Zero dependencies, no network.
pip install duphunt
$ duphunt ~/Downloads
2 duplicate group(s), 5 files, 8.1 MB reclaimable
4.1 MB × 2 4.1 MB reclaimable
/Users/me/Downloads/invoice.pdf
/Users/me/Downloads/invoice (1).pdf
2.0 MB × 3 4.0 MB reclaimable
/Users/me/Downloads/clip.mp4
/Users/me/Downloads/clip-copy.mp4
/Users/me/Downloads/old/clip.mp4
Groups are sorted biggest-waste-first, so the files worth deleting are at the top.
This is the Python build. A result-equivalent Node build is on npm:
npx duphunt(https://github.com/jjdoor/duphunt).
How it works
- Group by size. Two files of different sizes can't be identical, so files with a unique size are never even read.
- Hash the collisions. Within each size group, each file is SHA-256 hashed (streamed in 64 KB chunks, so multi-GB files don't blow up memory).
- Report identical content. Files with the same hash are true byte-for-byte duplicates, grouped and ranked by reclaimable space.
It reports — it never deletes. You decide what to remove.
Usage
duphunt # scan the current directory
duphunt ~/Downloads ~/Desktop # scan several roots at once
duphunt a.jpg b.jpg c.jpg # or just compare specific files
duphunt . --json # machine-readable
duphunt . --min-size 1048576 # ignore files under 1 MB
duphunt . --exit-code # exit 1 if any duplicates exist (CI gate)
Options
| Flag | Effect |
|---|---|
--json |
Emit { groups, summary } as JSON (raw byte sizes, full paths) |
--quiet |
Print only the one-line summary |
--min-size <n> |
Ignore files smaller than n bytes (default 1 — skips empty files) |
--follow |
Follow symlinks (default: skip them, to avoid loops and double-counting) |
--exit-code |
Exit 1 when duplicates are found (for CI gates) |
-v, --version |
Print version |
-h, --help |
Show help |
Notes
- Empty files are skipped by default (they all hash alike and are rarely what
you mean); pass
--min-size 0to include them. - Symlinks are skipped unless
--follow, so a symlinked tree won't be double-counted or loop forever. - Each physical file is counted once. Repeated or overlapping roots and
symlink aliases (even under
--follow) are de-duplicated by real path, so they never inflate the results — while genuine hard links still surface. - Same tool, two builds. The Python and Node builds hash with SHA-256 and produce identical results — use whichever your environment already has.
--json shape
{
"groups": [
{ "hash": "9f86d0…", "size": 4300000, "count": 2, "wasted": 4300000,
"paths": ["/a/invoice.pdf", "/b/invoice (1).pdf"] }
],
"summary": { "groups": 1, "files": 2, "wasted": 4300000 }
}
Exit codes
| Code | Meaning |
|---|---|
0 |
success (default — even when duplicates are found) |
1 |
duplicates found and --exit-code was passed |
2 |
error (bad option, missing path) |
By default duphunt is a viewer and exits 0; add --exit-code to gate a
pipeline on it.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file duphunt-0.1.0.tar.gz.
File metadata
- Download URL: duphunt-0.1.0.tar.gz
- Upload date:
- Size: 8.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f8fc4b425ea41af29088c37d23892f0d4e7b8139cb2816a820988fc1013d21e
|
|
| MD5 |
6deb5f8f53faca5f6c8009e3c459207d
|
|
| BLAKE2b-256 |
7dd37e1d06b8379b504341ad33e746e7a7a71b9be6665a25c8bfd5cd9c9e5c80
|
File details
Details for the file duphunt-0.1.0-py3-none-any.whl.
File metadata
- Download URL: duphunt-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41170e9d9602a17c939304c2dcc18c721f0582482f58de016691fcafde01a024
|
|
| MD5 |
489107471e81ba168d0491278220bad9
|
|
| BLAKE2b-256 |
659d099b2593e5b99804f70a76779b3682a2511c59c0b8e81e3248e56ab30945
|