A Python package for finding and managing duplicate files using hardlinks
Project description
Dupln. This command-line application scans a specified directory for duplicate files and replaces duplicates with hard links to a single copy of the file. By doing so, it conserves storage space while preserving the file structure and accessibility.
☕ Support
If you find this project helpful, consider supporting me:
Features
- Fast duplicate detection using file sizes, inodes, and MD5 hashes
- Space optimization by replacing duplicates with links
- Multiple operations:
- Statistics (
stat) - Linking (
link) - Listing unique files (
uniques) - Listing duplicates (
duplicates)
- Statistics (
Install
> pip install dupln
Usage
Usage
Basic Commands
# Show statistics about duplicates
dupln stat /path/to/directory
# Link duplicates using hardlinks (default)
dupln link /path/to/directory
# List unique files
dupln uniques /path/to/directory
# List duplicate files
dupln duplicates /path/to/directory
Hard link files with same content
> dupln link '/tmp/dupln'
INFO: Scanning: '/tmp/dupln'
INFO: ++ '/tmp/dupln/as/ci/i_letters' [2]
INFO: - '/tmp/dupln/as/cii_letters' - '/tmp/dupln/as/tmp7uwq0r4l' [1]
INFO: - '/tmp/dupln/as/ci/i_/letters' - '/tmp/dupln/as/ci/i_/tmp0beeaxht' [0]
INFO: ++ '/tmp/dupln/as/ci/i_uppercase' [2]
INFO: - '/tmp/dupln/as/ci/i_/uppercase' - '/tmp/dupln/as/ci/i_/tmpcsykrlv5' [1]
INFO: - '/tmp/dupln/as/cii_uppercase' - '/tmp/dupln/as/tmp5knmbazf' [0]
INFO: ++ '/tmp/dupln/as/ci/i_/lowercase' [2]
INFO: - '/tmp/dupln/as/ci/i_lowercase' - '/tmp/dupln/as/ci/tmpxeegm9eu' [1]
INFO: - '/tmp/dupln/as/cii_lowercase' - '/tmp/dupln/as/tmp8ra1cf6z' [0]
INFO: ++ '/tmp/dupln/di/gits' [1]
INFO: - '/tmp/dupln/di/gi/ts' - '/tmp/dupln/di/gi/tmp80gznyej' [0]
INFO: ++ '/tmp/dupln/he/xd/ig/its' [2]
INFO: - '/tmp/dupln/he/xd/igits' - '/tmp/dupln/he/xd/tmpg3jm_ttb' [1]
INFO: - '/tmp/dupln/he/xdigits' - '/tmp/dupln/he/tmp2nqxy47g' [0]
INFO: ++ '/tmp/dupln/oc/td/igits' [2]
INFO: - '/tmp/dupln/oc/tdigits' - '/tmp/dupln/oc/tmpodvxqodo' [1]
INFO: - '/tmp/dupln/oc/td/ig/its' - '/tmp/dupln/oc/td/ig/tmp1um7nupk' [0]
INFO: ++ '/tmp/dupln/pr/intable' [2]
INFO: - '/tmp/dupln/pr/in/ta/ble' - '/tmp/dupln/pr/in/ta/tmploz2qhry' [1]
INFO: - '/tmp/dupln/pr/in/table' - '/tmp/dupln/pr/in/tmptf8egynt' [0]
INFO: ++ '/tmp/dupln/pu/nc/tu/ation' [2]
INFO: - '/tmp/dupln/pu/nctuation' - '/tmp/dupln/pu/tmp4yjomdni' [1]
INFO: - '/tmp/dupln/pu/nc/tuation' - '/tmp/dupln/pu/nc/tmpp0hsusw1' [0]
INFO: ++ '/tmp/dupln/wh/it/es/pace' [2]
INFO: - '/tmp/dupln/wh/it/espace' - '/tmp/dupln/wh/it/tmpd2plpkm7' [1]
INFO: - '/tmp/dupln/wh/itespace' - '/tmp/dupln/wh/tmpg7bw47b1' [0]
INFO: Total disk_size 564b; files 35; inodes 35; linked 17; same_hash 9; same_size 8; size 1.1k; uniq_hash 9;
List unique file content
> dupln uniques '/tmp/dupln'
INFO: Scanning: '/tmp/dupln'
/tmp/dupln/as/ci/i_/letters
/tmp/dupln/ascii_letters
/tmp/dupln/as/cii_uppercase
/tmp/dupln/as/cii_lowercase
/tmp/dupln/ascii_lowercase
/tmp/dupln/ascii_uppercase
/tmp/dupln/di/gi/ts
/tmp/dupln/digits
/tmp/dupln/he/xd/igits
/tmp/dupln/hexdigits
/tmp/dupln/oc/td/igits
/tmp/dupln/octdigits
/tmp/dupln/pr/in/table
/tmp/dupln/printable
/tmp/dupln/pu/nctuation
/tmp/dupln/punctuation
/tmp/dupln/wh/itespace
/tmp/dupln/whitespace
INFO: Total devices 1; disk_size 564b; files 35; inodes 18; same_ino 9; size 1.1k; unique_size 8;
Show stats about duplicate files
> dupln stat '/tmp/dupln'
INFO: Scanning: '/tmp/dupln'
INFO: Total disk_size 564b; files 35; inodes 18; same_ino 9; same_size 8; size 1.1k;
Stats meaning
- disk_size - total size excluding duplicate files
- size - total size including duplicate files
- files - total files found
- inodes - total unique i-nodes found
- same_ino - total unique i-nodes found at least twice
- same_size - total unique size found at least twice
- same_hash - total unique hash found at least twice
- unique_size - total unique size found
- unique_hash - total unique file hash found
Advanced Options
# Use symbolic links instead of hardlinks
dupln link --linker os.symlink /path/to/dir
# Filter by size range (1MB to 10MB)
dupln duplicates --sizes 1M..10M /path/to/dir
# Continue on errors
dupln link --carry-on /path/to/dir
Safety Features
- Dry-run mode: Use
statcommand first to preview - Atomic operations: Temporary files used during linking
- Error recovery:
--carry-oncontinues after errors
Linker Types
| Option | Description |
|---|---|
os.link |
Python hardlinks (default) |
os.symlink |
Python symbolic links |
ln |
System hardlinks (ln command) |
lns |
System symlinks (ln -s) |
Requirements
- Python 3.7+
- Optional:
PyYAMLfor YAML output support
Note: Always back up important data before running file operations.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dupln-0.1.2.tar.gz.
File metadata
- Download URL: dupln-0.1.2.tar.gz
- Upload date:
- Size: 18.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
254fb714018e93269c5206cdba0b976a9dfbbd936f575776910f123db72552c9
|
|
| MD5 |
407699c21997f83f37b5464a3cecbc7f
|
|
| BLAKE2b-256 |
7e43d11a28193b27938372d0179baa04b33c5bd17dd1fb9a69fea52fafd50deb
|
File details
Details for the file dupln-0.1.2-py3-none-any.whl.
File metadata
- Download URL: dupln-0.1.2-py3-none-any.whl
- Upload date:
- Size: 15.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24ada31c1ba53cdd44c777d7282c18593bab567b64e7a4fbd85b14e6fc102a4e
|
|
| MD5 |
8934fce252a84c90606d035560260e78
|
|
| BLAKE2b-256 |
b52eef739ed23562a4dd2588f9b325d1ba8d3174bbccdc93feee290cbf0dbb47
|