Skip to main content
This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (pypi.python.org).
Help us improve Python packaging - Donate today!

Analyse all files in one or more directories and manage duplicate files (the same file present with different names)

Project Description

Introduction

This application help you cleaning your filesystem from duplicate files. The duplicate meaning here is: two or more files have the same content but can have different names.

You can use it in this way:

Usage: duplicatefinder.py [options] [directories]

Analyse all files in one or more directories and manage duplicate files (the
same file present with different names)

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -a ACTION, --action=ACTION
                        choose an action to do when a duplicate is found.
                        Valid options are print,rename,move,ask; print is the
                        default
  -r, --recursive       also check files in subdirectories recursively
  -p PREFIX, --prefix=PREFIX
                        prefix used for renaming duplicated files when the
                        'rename' action is chosen. Default is "DUPLICATED"
  -m PATH, --move-path=PATH
                        the directory where duplicate will be moved when the
                        'move' action is chosen
  -v, --verbose         more verbose output
  -q, --quiet           do not print any messages at all

  Filters:
    Use those options to limit and filter directories and files to check.
    Options belowe that rely on file or directory name support usage of
    jolly characters and can also be used multiple times

    -s MIN_SIZE, --min-size=MIN_SIZE
                        indicate the min size in bytes of a file for being
                        checked. Default is 128. Empty file are always ignored
    --include-dir=INCLUDE_DIR
                        only check directories with this name
    --exclude-dir=EXCLUDE_DIR
                        do not check directories with this name
    --include-file=INCLUDE_FILE
                        limit the search inside file with that name
    --exclude-file=EXCLUDE_FILE
                        ignore the search inside file with that name

Report bugs (and suggestions) to <luca@keul.it>.

TODO

  • More tests coverage (maybe some tests can be merged togheter).
  • Controls recursion maximum depth.
  • Internationalization (at least italian).
  • A “move to trash” action (dependency on trash-cli can be a great idea).
  • Release this as a Debian/Ubuntu/Kubuntu package (I’ll really love this).

Credits

  • Thanks to Lord Epzylon for sending me some code and modifications.

Subversion and other

The SVN repository is hosted at the Keul’s Python Libraries

Changelog

0.3.0

  • The runnable script name has been changed to duplicatefinder.py.
  • You can now pass multiple target directories as parameters.
  • Added a –action=ask option for choosing at every duplicate what action perform (interactive mode).
  • Added the –include-dir option for limit the search only to specific directories.
  • Added the –exclude-dir option for skipping the search from some directories.
  • Added the –include-file option for match only some files in the search.
  • Added the –exclude-file option for skipping files from the search, based on file name.
  • Using a wrong directory name was not handled, but was producing only abnormal termination.
  • More kindly handle of the break (CTRL+C) user’s action.
  • Added the –verbose option to print some more message infos.
  • Added the –quiet option to output nothing at all.
  • Removed the _same_file function. Python already have a filecmp module (hoping this is faster)!
  • Added environment for automated tests, and tests too (use the –action=tests).
  • Some fixes to the command line help.

0.2.0

  • Added the move action.
  • Added the –recursive option, to walk an entire tree of folders (thanks to Lord Epzylon).
  • Added the –min-size option, to specify a minimum size of the files to be checked.

0.1.2

  • Bad bug in the setup.py. Code was ok but the 0.1.1 egg was not installable. Thanks to the everywhere present A. Jung.

0.1.1

  • Fix to the setup.py script.
  • Added doc infos.
  • First egg official release.

0.1.0 - Unreleased

  • First (un)release
Release History

Release History

This version
History Node

0.3.0

History Node

0.2.0

History Node

0.1.2

History Node

0.1.0

Download Files

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
PyDirDuplicateFinder-0.3.0-py2.5.egg (32.0 kB) Copy SHA256 Checksum SHA256 2.5 Egg Aug 15, 2009
PyDirDuplicateFinder-0.3.0.tar.gz (10.9 kB) Copy SHA256 Checksum SHA256 Source Aug 15, 2009

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting