Skip to main content

A Python package for finding and managing duplicate files using hardlinks

Project description

DupLn Logo

Python Version PyPI version fury.io Tests Status

Dupln. This command-line application scans a specified directory for duplicate files and replaces duplicates with hard links to a single copy of the file. By doing so, it conserves storage space while preserving the file structure and accessibility.

☕ Support

If you find this project helpful, consider supporting me:

ko-fi

Features

  • Fast duplicate detection using file sizes, inodes, and MD5 hashes
  • Space optimization by replacing duplicates with links
  • Multiple operations:
    • Statistics (stat)
    • Linking (link)
    • Listing unique files (uniques)
    • Listing duplicates (duplicates)

Install

> pip install dupln

Usage

Usage

Basic Commands

# Show statistics about duplicates
dupln stat /path/to/directory

# Link duplicates using hardlinks (default)
dupln link /path/to/directory

# List unique files
dupln uniques /path/to/directory

# List duplicate files
dupln duplicates /path/to/directory

Hard link files with same content

> dupln link '/tmp/dupln'
INFO: Scanning: '/tmp/dupln'
INFO: ++ '/tmp/dupln/as/ci/i_letters' [2]
INFO:  - '/tmp/dupln/as/cii_letters' - '/tmp/dupln/as/tmp7uwq0r4l' [1]
INFO:  - '/tmp/dupln/as/ci/i_/letters' - '/tmp/dupln/as/ci/i_/tmp0beeaxht' [0]
INFO: ++ '/tmp/dupln/as/ci/i_uppercase' [2]
INFO:  - '/tmp/dupln/as/ci/i_/uppercase' - '/tmp/dupln/as/ci/i_/tmpcsykrlv5' [1]
INFO:  - '/tmp/dupln/as/cii_uppercase' - '/tmp/dupln/as/tmp5knmbazf' [0]
INFO: ++ '/tmp/dupln/as/ci/i_/lowercase' [2]
INFO:  - '/tmp/dupln/as/ci/i_lowercase' - '/tmp/dupln/as/ci/tmpxeegm9eu' [1]
INFO:  - '/tmp/dupln/as/cii_lowercase' - '/tmp/dupln/as/tmp8ra1cf6z' [0]
INFO: ++ '/tmp/dupln/di/gits' [1]
INFO:  - '/tmp/dupln/di/gi/ts' - '/tmp/dupln/di/gi/tmp80gznyej' [0]
INFO: ++ '/tmp/dupln/he/xd/ig/its' [2]
INFO:  - '/tmp/dupln/he/xd/igits' - '/tmp/dupln/he/xd/tmpg3jm_ttb' [1]
INFO:  - '/tmp/dupln/he/xdigits' - '/tmp/dupln/he/tmp2nqxy47g' [0]
INFO: ++ '/tmp/dupln/oc/td/igits' [2]
INFO:  - '/tmp/dupln/oc/tdigits' - '/tmp/dupln/oc/tmpodvxqodo' [1]
INFO:  - '/tmp/dupln/oc/td/ig/its' - '/tmp/dupln/oc/td/ig/tmp1um7nupk' [0]
INFO: ++ '/tmp/dupln/pr/intable' [2]
INFO:  - '/tmp/dupln/pr/in/ta/ble' - '/tmp/dupln/pr/in/ta/tmploz2qhry' [1]
INFO:  - '/tmp/dupln/pr/in/table' - '/tmp/dupln/pr/in/tmptf8egynt' [0]
INFO: ++ '/tmp/dupln/pu/nc/tu/ation' [2]
INFO:  - '/tmp/dupln/pu/nctuation' - '/tmp/dupln/pu/tmp4yjomdni' [1]
INFO:  - '/tmp/dupln/pu/nc/tuation' - '/tmp/dupln/pu/nc/tmpp0hsusw1' [0]
INFO: ++ '/tmp/dupln/wh/it/es/pace' [2]
INFO:  - '/tmp/dupln/wh/it/espace' - '/tmp/dupln/wh/it/tmpd2plpkm7' [1]
INFO:  - '/tmp/dupln/wh/itespace' - '/tmp/dupln/wh/tmpg7bw47b1' [0]
INFO: Total disk_size 564b; files 35; inodes 35; linked 17; same_hash 9; same_size 8; size 1.1k; uniq_hash 9;

List unique file content

> dupln uniques '/tmp/dupln'
INFO: Scanning: '/tmp/dupln'
/tmp/dupln/as/ci/i_/letters
/tmp/dupln/ascii_letters
/tmp/dupln/as/cii_uppercase
/tmp/dupln/as/cii_lowercase
/tmp/dupln/ascii_lowercase
/tmp/dupln/ascii_uppercase
/tmp/dupln/di/gi/ts
/tmp/dupln/digits
/tmp/dupln/he/xd/igits
/tmp/dupln/hexdigits
/tmp/dupln/oc/td/igits
/tmp/dupln/octdigits
/tmp/dupln/pr/in/table
/tmp/dupln/printable
/tmp/dupln/pu/nctuation
/tmp/dupln/punctuation
/tmp/dupln/wh/itespace
/tmp/dupln/whitespace
INFO: Total devices 1; disk_size 564b; files 35; inodes 18; same_ino 9; size 1.1k; unique_size 8;

Show stats about duplicate files

> dupln stat '/tmp/dupln'
INFO: Scanning: '/tmp/dupln'
INFO: Total disk_size 564b; files 35; inodes 18; same_ino 9; same_size 8; size 1.1k;

Stats meaning

  • disk_size - total size excluding duplicate files
  • size - total size including duplicate files
  • files - total files found
  • inodes - total unique i-nodes found
  • same_ino - total unique i-nodes found at least twice
  • same_size - total unique size found at least twice
  • same_hash - total unique hash found at least twice
  • unique_size - total unique size found
  • unique_hash - total unique file hash found

Advanced Options

# Use symbolic links instead of hardlinks
dupln link --linker os.symlink /path/to/dir

# Filter by size range (1MB to 10MB)
dupln duplicates --sizes 1M..10M /path/to/dir

# Continue on errors
dupln link --carry-on /path/to/dir

Safety Features

  • Dry-run mode: Use stat command first to preview
  • Atomic operations: Temporary files used during linking
  • Error recovery: --carry-on continues after errors

Linker Types

Option Description
os.link Python hardlinks (default)
os.symlink Python symbolic links
ln System hardlinks (ln command)
lns System symlinks (ln -s)

Requirements

  • Python 3.7+
  • Optional: PyYAML for YAML output support

Note: Always back up important data before running file operations.


PyPI license Python package Upload Python Package

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dupln-0.1.2.tar.gz (18.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dupln-0.1.2-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file dupln-0.1.2.tar.gz.

File metadata

  • Download URL: dupln-0.1.2.tar.gz
  • Upload date:
  • Size: 18.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for dupln-0.1.2.tar.gz
Algorithm Hash digest
SHA256 254fb714018e93269c5206cdba0b976a9dfbbd936f575776910f123db72552c9
MD5 407699c21997f83f37b5464a3cecbc7f
BLAKE2b-256 7e43d11a28193b27938372d0179baa04b33c5bd17dd1fb9a69fea52fafd50deb

See more details on using hashes here.

File details

Details for the file dupln-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: dupln-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for dupln-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 24ada31c1ba53cdd44c777d7282c18593bab567b64e7a4fbd85b14e6fc102a4e
MD5 8934fce252a84c90606d035560260e78
BLAKE2b-256 b52eef739ed23562a4dd2588f9b325d1ba8d3174bbccdc93feee290cbf0dbb47

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page