Find identical files in subdirectories
Scan for identical files (duplicates) in subdirectories.
- Python >= 3.6
- MS Windows is not supported
To find files with identical content the given directories will be scanned and for files of same size their SHA-256 fingerprints are calculated and compared. Two files with identical fingerprints are considered to have the same content. There is a tiny chance for two files with same fingerprint to have different content, but this chance is very remote.
Symbolic links and hidden entries are ignored by default, this behaviour can
be changed with CLI options
--hidden and constructor options
This one will give you a short command overview:
$ duplicates --help
dirC for duplicates and report all found
$ duplicates dirA dirB dirC dirA/file01 dirA/file01.bak dirB/file.bak dirA/file02 dirB/file02~
The oldest file is printed without indent, all identical files are printed indented by a tab character. The oldest file is supposed to be the original.
If you are willing to take risks, you can delete all duplicates at once. I wouldn't dare, but you get the picture:
$ duplicates --dups-only dirA dirB | while read dups ; do xargs -0 rm $dups ; done
--dups-only all duplicates for one original are output on one line,
\0 (ASCII code zero).
For fish shell it looks almost identical:
$ duplicates --dups-only dirA dirB | while read -la dups ; xargs -0 rm $dups ; end
import duplicates df = duplicates.DupFinder(verbose=True) uniq, dups = df.scan(".")
uniq is a list of unique file objects.
dups is a list of identical files,
which in turn are lists of file objects, the first being the oldest element
and thus the supposed original.
A file object is a dict consisting of the following elements:
path: a pathlib.Path object
age: modification time in seconds (Unix time)
size: file size in bytes
hash: the SHA-256 fingerprint (not calculated for unique files)
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Hashes for duplicates-0.1.0-py3-none-any.whl