Skip to main content

Analyse all files in one or more directories and manage duplicate files (the same file present with different names)

Project description

Introduction

This application help you cleaning your filesystem from duplicate files. The duplicate meaning here is: two or more files have the same content but can have different names.

You can use it in this way:

Usage: duplicatefinder.py [options] [directories]

Analyse all files in one or more directories and manage duplicate files (the
same file present with different names)

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -a ACTION, --action=ACTION
                        choose an action to do when a duplicate is found.
                        Valid options are print,rename,move,ask; print is the
                        default
  -r, --recursive       also check files in subdirectories recursively
  -p PREFIX, --prefix=PREFIX
                        prefix used for renaming duplicated files when the
                        'rename' action is chosen. Default is "DUPLICATED"
  -m PATH, --move-path=PATH
                        the directory where duplicate will be moved when the
                        'move' action is chosen
  -v, --verbose         more verbose output
  -q, --quiet           do not print any messages at all

  Filters:
    Use those options to limit and filter directories and files to check.
    Options belowe that rely on file or directory name support usage of
    jolly characters and can also be used multiple times

    -s MIN_SIZE, --min-size=MIN_SIZE
                        indicate the min size in bytes of a file for being
                        checked. Default is 128. Empty file are always ignored
    --include-dir=INCLUDE_DIR
                        only check directories with this name
    --exclude-dir=EXCLUDE_DIR
                        do not check directories with this name
    --include-file=INCLUDE_FILE
                        limit the search inside file with that name
    --exclude-file=EXCLUDE_FILE
                        ignore the search inside file with that name

Report bugs (and suggestions) to <luca@keul.it>.

TODO

  • More tests coverage (maybe some tests can be merged togheter).

  • Controls recursion maximum depth.

  • Internationalization (at least italian).

  • A “move to trash” action (dependency on trash-cli can be a great idea).

  • Release this as a Debian/Ubuntu/Kubuntu package (I’ll really love this).

Credits

  • Thanks to Lord Epzylon for sending me some code and modifications.

Subversion and other

The SVN repository is hosted at the Keul’s Python Libraries

Changelog

0.3.0

  • The runnable script name has been changed to duplicatefinder.py.

  • You can now pass multiple target directories as parameters.

  • Added a –action=ask option for choosing at every duplicate what action perform (interactive mode).

  • Added the –include-dir option for limit the search only to specific directories.

  • Added the –exclude-dir option for skipping the search from some directories.

  • Added the –include-file option for match only some files in the search.

  • Added the –exclude-file option for skipping files from the search, based on file name.

  • Using a wrong directory name was not handled, but was producing only abnormal termination.

  • More kindly handle of the break (CTRL+C) user’s action.

  • Added the –verbose option to print some more message infos.

  • Added the –quiet option to output nothing at all.

  • Removed the _same_file function. Python already have a filecmp module (hoping this is faster)!

  • Added environment for automated tests, and tests too (use the –action=tests).

  • Some fixes to the command line help.

0.2.0

  • Added the move action.

  • Added the –recursive option, to walk an entire tree of folders (thanks to Lord Epzylon).

  • Added the –min-size option, to specify a minimum size of the files to be checked.

0.1.2

  • Bad bug in the setup.py. Code was ok but the 0.1.1 egg was not installable. Thanks to the everywhere present A. Jung.

0.1.1

  • Fix to the setup.py script.

  • Added doc infos.

  • First egg official release.

0.1.0 - Unreleased

  • First (un)release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyDirDuplicateFinder-0.3.0.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

PyDirDuplicateFinder-0.3.0-py2.5.egg (32.0 kB view details)

Uploaded Source

File details

Details for the file PyDirDuplicateFinder-0.3.0.tar.gz.

File metadata

File hashes

Hashes for PyDirDuplicateFinder-0.3.0.tar.gz
Algorithm Hash digest
SHA256 8c4cfbeaf247266ba37a0315f7f0a3f94a265767c88e5fc666d0ae8fa34f9618
MD5 f9953ec624f6bc06d749d11937d95e86
BLAKE2b-256 a2c6638b2a5d5baddae5e71500015c1f4565ee8cd31800450b414a19001a2ed9

See more details on using hashes here.

File details

Details for the file PyDirDuplicateFinder-0.3.0-py2.5.egg.

File metadata

File hashes

Hashes for PyDirDuplicateFinder-0.3.0-py2.5.egg
Algorithm Hash digest
SHA256 09d7059a99a6025385035bb0dbdfba8ec08ddc82088e1fba174d8415b467de11
MD5 9985f235dd81b4aa67d899db60ac9577
BLAKE2b-256 e2aca681a3e02c1f50381c5cdd4a71faa8eb6dc5a3d76e6c0b48e0ef4886f9a3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page