Kill duplicate files, finding partial files as well
Python version support: CPython 2.6, 2.7, 3.2, 3.3, 3.4 and PyPy.
How it works
killdupes scans your filesystem to find duplicate files, partial files and empty files.
It performs n:n comparison of files through md5 hashing and heavy use of dictionaries. Execute with wildcard, or input file containing file names to check.
- Scan all files, find the smallest.
- Read read size amount of bytes (equal to the remaining size of the smallest file, or at most CHUNK size) from all files into records.
- Hash all records, use hashes as keys into offsets[current_offset] dict.
- Files in the same bucket are known to be equal up to this offset.
- Continue until at least two files remain that are still equal at all offsets.
- Equal files are either a duplicate case (if they are the same size), or one is partial relative to the other (if not the same size).
Memory consumption should not exceed files_in_bucket * read_size.
The algorithm adapts to file changes; it will read all files until eof regardless of the filesize as recorded at startup.
$ pip install killdupes
$ killdupes.py 'tests/samples/*' Empty files: X 0.0 B 14.03.14 17:39:48 tests/samples/empty Incompletes: = 2.0 B 14.03.14 18:17:43 tests/samples/full X 1.0 B 14.03.14 18:17:26 tests/samples/partial Duplicates: = 2.0 B 14.03.14 18:17:43 tests/samples/full X 2.0 B 14.03.14 18:17:37 tests/samples/full2 Kill files? (all/empty/incompletes/duplicates) [a/e/i/d/N]
If there are many files to scan it will display a progress dashboard while working:
176.1 KB | Offs 0.0 B | Buck 1/1 | File 193868/600084 | Rs 1.0 B
The dashboard fields:
- Total bytes read
- Current offset of reading
- Current number of buckets
- File/files in this bucket
- Readsize at this offset
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size killdupes-0.1.6-py2.py3-none-any.whl (8.1 kB)||File type Wheel||Python version 2.7||Upload date||Hashes View hashes|
|Filename, size killdupes-0.1.6.tar.gz (4.9 kB)||File type Source||Python version None||Upload date||Hashes View hashes|
Hashes for killdupes-0.1.6-py2.py3-none-any.whl