Kill duplicate files, finding partial files as well
Project description
Python version support: CPython 2.6, 2.7, 3.2, 3.3, 3.4 and PyPy.
How it works
killdupes scans your filesystem to find duplicate files, partial files and empty files.
It performs n:n comparison of files through md5 hashing and heavy use of dictionaries. Execute with wildcard, or input file containing file names to check.
The method:
- Scan all files, find the smallest.
- Read read size amount of bytes (equal to the remaining size of the smallest file, or at most CHUNK size) from all files into records.
- Hash all records, use hashes as keys into offsets[current_offset] dict.
- Files in the same bucket are known to be equal up to this offset.
- Continue until at least two files remain that are still equal at all offsets.
- Equal files are either a duplicate case (if they are the same size), or one is partial relative to the other (if not the same size).
Memory consumption should not exceed files_in_bucket * read_size.
The algorithm adapts to file changes; it will read all files until eof regardless of the filesize as recorded at startup.
Installation
$ pip install killdupes
Usage
$ killdupes.py 'tests/samples/*' Empty files: X 0.0 B 14.03.14 17:39:48 tests/samples/empty Incompletes: = 2.0 B 14.03.14 18:17:43 tests/samples/full X 1.0 B 14.03.14 18:17:26 tests/samples/partial Duplicates: = 2.0 B 14.03.14 18:17:43 tests/samples/full X 2.0 B 14.03.14 18:17:37 tests/samples/full2 Kill files? (all/empty/incompletes/duplicates) [a/e/i/d/N]
If there are many files to scan it will display a progress dashboard while working:
176.1 KB | Offs 0.0 B | Buck 1/1 | File 193868/600084 | Rs 1.0 B
The dashboard fields:
- Total bytes read
- Current offset of reading
- Current number of buckets
- File/files in this bucket
- Readsize at this offset
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size | File type | Python version | Upload date | Hashes |
---|---|---|---|---|
Filename, size killdupes-0.1.6-py2.py3-none-any.whl (8.1 kB) | File type Wheel | Python version 2.7 | Upload date | Hashes View |
Filename, size killdupes-0.1.6.tar.gz (4.9 kB) | File type Source | Python version None | Upload date | Hashes View |
Hashes for killdupes-0.1.6-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 195ce0818e0154e69a95b802f36c19c21896c546825bd2bb47d4fa03fdcd6a49 |
|
MD5 | d689e663ab1444cb8170226bafdba356 |
|
BLAKE2-256 | 503bfeb2b7a08063e1ed04251a8184701654b4acdcb2e5128d26433d835111be |