Skip to main content

Find identical files in subdirectories

Project description


Build Status

Scan for identical files (duplicates) in subdirectories.


  • Python >= 3.6
  • MS Windows is not supported


To find files with identical content the given directories will be scanned and for files of same size their SHA-256 fingerprints are calculated and compared. Two files with identical fingerprints are considered to have the same content. There is a tiny chance for two files with same fingerprint to have different content, but this chance is very remote.

Symbolic links and hidden entries are ignored by default, this behaviour can be changed with CLI options --follow/--hidden and constructor options ignore_hidden/ignore_symlinks.

CLI examples

This one will give you a short command overview:

$ duplicates --help

Scan directories dirA, dirB and dirC for duplicates and report all found identical files:

$ duplicates dirA dirB dirC


The oldest file is printed without indent, all identical files are printed indented by a tab character. The oldest file is supposed to be the original.

If you are willing to take risks, you can delete all duplicates at once. I wouldn't dare, but you get the picture:

$ duplicates --dups-only dirA dirB | while read dups ; do xargs -0 rm $dups ; done

With --dups-only all duplicates for one original are output on one line, separated by \0 (ASCII code zero).

For fish shell it looks almost identical:

$ duplicates --dups-only dirA dirB | while read -la dups ; xargs -0 rm $dups ; end

Python examples

import duplicates

df = duplicates.DupFinder(verbose=True)
uniq, dups = df.scan(".")

uniq is a list of unique file objects. dups is a list of identical files, which in turn are lists of file objects, the first being the oldest element and thus the supposed original.

A file object is a dict consisting of the following elements:

  • path: a pathlib.Path object
  • age: modification time in seconds (Unix time)
  • size: file size in bytes
  • hash: the SHA-256 fingerprint (not calculated for unique files)

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duplicates-0.1.0.tar.gz (6.0 kB view hashes)

Uploaded source

Built Distribution

duplicates-0.1.0-py3-none-any.whl (7.8 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page