Kill duplicate files, finding partial files as well
Python version support: CPython 2.6, 2.7, 3.2, 3.3 and PyPy.
How it works
killdupes scans your filesystem to find duplicate files, partial files and empty files.
Performs n:n comparison of files through md5 hashing and heavy use of hashtables. Execute with wildcard, or input file containing file names to check.
- Scan all files, find the smallest.
- Read read size amount of bytes (equal to the remaining size of the smallest file, or at most CHUNK size) from all files into records.
- Hash all records, use hashes as keys into offsets[current_offset] dict.
- Files in the same bucket are known to be equal up to this offset.
- Continue until at least two files remain that are still equal at all offsets.
- Equal files are either a duplicate case (if they are the same size), or one is partial relative to the other (if not the same size).
Memory consumption should not exceed files_in_bucket * read_size
The algorithm adapts to file changes; it will read all files until eof regardless of the filesize as recorded at startup.
$ pip install killdupes
$ killdupes.py * 176.1 KB | Offs 0.0 B | Buck 1/1 | File 193868/600084 | Rs 1.0 B
The dashboard fields:
- Total bytes read
- Current offset of reading
- Current number of buckets
- File/files in this bucket
- Readsize at this offset
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size & hash SHA256 hash help||File type||Python version||Upload date|
|killdupes-0.1.0-py2.py3-none-any.whl (7.4 kB) Copy SHA256 hash SHA256||Wheel||2.7||Mar 13, 2014|
|killdupes-0.1.0.tar.gz (4.3 kB) Copy SHA256 hash SHA256||Source||None||Mar 13, 2014|