Skip to main content

Package to find duplicate files in and across folders

Project description

About Duplicates Finder

Duplicates Finder is a simple Python package that identifies duplicate files in and across folders. There are three ways to search for identical files:

  1. List all duplicate files in a folder of interest.
  2. Pick a file and find all duplications in a folder.
  3. Directly compare two folders against each other.

The results are saved as a Pandas Dataframe or can be exported as .csv files. More information about the underlying concept can also be found in this short article.


Installation

You can either clone the repository directly from the Github webpage or run the following command(s) in your terminal:

Pip Installation:

pip install duplicate-finder

Alternatively you can clone the Git repository:

git clone https://github.com/akcarsten/duplicates.git

Then go to the folder to which you cloned the repository and run:

python setup.py install

Now you can run Python and import the Bitfinex client.


Examples of how to use the package

Example 1: List all duplicate files in a folder of interest.

import duplicates as dup


folder_of_interest = 'C:/manyDuplicatesHere/'
dup.list_all_duplicates(folder_of_interest, to_csv=True, csv_path='C:/csvWithAllDuplicates/', fastscan=True)

Here the fastscan parameter is set to True (default is false). By doing so a pre-selection of potential duplicate files is performed based on the file size. If only a specific type of files is of interest this can be further defined by the 'ext' parameter. For example:

df = dup.list_all_duplicates(folder_of_interest, to_csv=True, csv_path='C:/csvWithAllDuplicates/', ext='.jpg')

Example 2: Pick a file and find all duplications in a folder.

import duplicates as dup


file_of_interest = 'C:/manyDuplicatesHere/thisFileExistsManyTimes.jpg'
folder_of_interest = 'C:/manyDuplicatesHere/'
df = dup.find_duplicates(file_of_interest, folder_of_interest)

Example 3: Directly compare two folders against each other.

import duplicates as dup


folder_of_interest_1 = 'C:/noDuplicatesHere/'
folder_of_interest_2 = 'C:/noDuplicatesHereAsWell/'
df = dup.compare_folders(folder_of_interest_1, folder_of_interest_2)

As in Example 1 above a specific filetype can be selected and the results can be written to a .csv file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Duplicate-Finder-1.4.0.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

Duplicate_Finder-1.4.0-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file Duplicate-Finder-1.4.0.tar.gz.

File metadata

  • Download URL: Duplicate-Finder-1.4.0.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.24.0 setuptools/51.1.1 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for Duplicate-Finder-1.4.0.tar.gz
Algorithm Hash digest
SHA256 4ef0c55899b4f9ccb8f5e7a14731200b5a902ffbfc024bf2b19bbb1cd9393b6f
MD5 8afc55275d310429cbafc93fcdecd4c4
BLAKE2b-256 14f64fdabcc153d20589de6cefe0d94d301aecb3229f52a76bbae8fb7d000282

See more details on using hashes here.

File details

Details for the file Duplicate_Finder-1.4.0-py3-none-any.whl.

File metadata

  • Download URL: Duplicate_Finder-1.4.0-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.24.0 setuptools/51.1.1 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for Duplicate_Finder-1.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d3736cda6fcdc48aecf28186d8e3923b7cfd19e7e410677292b74a19713d7be7
MD5 4ba5077f885a7960ce2b5fcbe4285226
BLAKE2b-256 40cdda0dc217d274493a64507aabecf433a2053eb22e34eb3285f0ddf23cc179

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page