Skip to main content

Find duplicate files by content (SHA-256) — zero install, cross-platform. pip install duphunt — no brew/apt. Zero dependencies.

Project description

duphunt

Find duplicate files by content — anywhere, with nothing to install. The great duplicate finders (fdupes, jdupes, rdfind, fclones) are native binaries you have to brew/apt/cargo install first — which you can't always do on a locked-down box, a colleague's laptop, a CI runner, or a container. duphunt runs the moment you have Python or Node: pip install duphunt or npx duphunt .. Zero dependencies, no network.

pip install duphunt

$ duphunt ~/Downloads

2 duplicate group(s), 5 files, 8.1 MB reclaimable

  4.1 MB × 2   4.1 MB reclaimable
    /Users/me/Downloads/invoice.pdf
    /Users/me/Downloads/invoice (1).pdf

  2.0 MB × 3   4.0 MB reclaimable
    /Users/me/Downloads/clip.mp4
    /Users/me/Downloads/clip-copy.mp4
    /Users/me/Downloads/old/clip.mp4

Groups are sorted biggest-waste-first, so the files worth deleting are at the top.

This is the Python build. A result-equivalent Node build is on npm: npx duphunt (https://github.com/jjdoor/duphunt).

How it works

  1. Group by size. Two files of different sizes can't be identical, so files with a unique size are never even read.
  2. Hash the collisions. Within each size group, each file is SHA-256 hashed (streamed in 64 KB chunks, so multi-GB files don't blow up memory).
  3. Report identical content. Files with the same hash are true byte-for-byte duplicates, grouped and ranked by reclaimable space.

It reports — it never deletes. You decide what to remove.

Usage

duphunt                      # scan the current directory
duphunt ~/Downloads ~/Desktop   # scan several roots at once
duphunt a.jpg b.jpg c.jpg    # or just compare specific files
duphunt . --json             # machine-readable
duphunt . --min-size 1048576 # ignore files under 1 MB
duphunt . --exit-code        # exit 1 if any duplicates exist (CI gate)

Options

Flag Effect
--json Emit { groups, summary } as JSON (raw byte sizes, full paths)
--quiet Print only the one-line summary
--min-size <n> Ignore files smaller than n bytes (default 1 — skips empty files)
--follow Follow symlinks (default: skip them, to avoid loops and double-counting)
--exit-code Exit 1 when duplicates are found (for CI gates)
-v, --version Print version
-h, --help Show help

Notes

  • Empty files are skipped by default (they all hash alike and are rarely what you mean); pass --min-size 0 to include them.
  • Symlinks are skipped unless --follow, so a symlinked tree won't be double-counted or loop forever.
  • Each physical file is counted once. Repeated or overlapping roots and symlink aliases (even under --follow) are de-duplicated by real path, so they never inflate the results — while genuine hard links still surface.
  • Same tool, two builds. The Python and Node builds hash with SHA-256 and produce identical results — use whichever your environment already has.

--json shape

{
  "groups": [
    { "hash": "9f86d0…", "size": 4300000, "count": 2, "wasted": 4300000,
      "paths": ["/a/invoice.pdf", "/b/invoice (1).pdf"] }
  ],
  "summary": { "groups": 1, "files": 2, "wasted": 4300000 }
}

Exit codes

Code Meaning
0 success (default — even when duplicates are found)
1 duplicates found and --exit-code was passed
2 error (bad option, missing path)

By default duphunt is a viewer and exits 0; add --exit-code to gate a pipeline on it.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duphunt-0.1.0.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

duphunt-0.1.0-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file duphunt-0.1.0.tar.gz.

File metadata

  • Download URL: duphunt-0.1.0.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for duphunt-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2f8fc4b425ea41af29088c37d23892f0d4e7b8139cb2816a820988fc1013d21e
MD5 6deb5f8f53faca5f6c8009e3c459207d
BLAKE2b-256 7dd37e1d06b8379b504341ad33e746e7a7a71b9be6665a25c8bfd5cd9c9e5c80

See more details on using hashes here.

File details

Details for the file duphunt-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: duphunt-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for duphunt-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 41170e9d9602a17c939304c2dcc18c721f0582482f58de016691fcafde01a024
MD5 489107471e81ba168d0491278220bad9
BLAKE2b-256 659d099b2593e5b99804f70a76779b3682a2511c59c0b8e81e3248e56ab30945

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page