Skip to main content

Package to find duplicate files in and across folders

Project description

About Duplicates Finder

Duplicates Finder is a simple Python package that identifies duplicate files in and across folders. There are three ways to search for identical files:

  1. List all duplicate files in a folder of interest.
  2. Pick a file and find all duplications in a folder.
  3. Directly compare two folders against each other.

The results are saved as a Pandas Dataframe or can be exported as .csv files.


Installation

You can either clone the repository directly from the Github webpage or run the following command(s) in your terminal:

Pip Installation:

pip install duplicate-finder

Alternatively you can clone the Git repository:

git clone https://github.com/akcarsten/duplicates.git

Then go to the folder to which you cloned the repository and run:

python setup.py install

Now you can run Python and import the Bitfinex client.


Examples of how to use the package

Example 1: List all duplicate files in a folder of interest.

import duplicates as dup


folder_of_interest = 'C:/manyDuplicatesHere/'
dup.list_all_duplicates(folder_of_interest, to_csv=True, csv_path='C:/csvWithAllDuplicates/')

If only a specific type of files is of interest this can be further defined by the 'ext' parameter. For example:

df = dup.list_all_duplicates(folder_of_interest, to_csv=True, csv_path='C:/csvWithAllDuplicates/', ext='.jpg')

Example 2: Pick a file and find all duplications in a folder.

import duplicates as dup


file_of_interest = 'C:/manyDuplicatesHere/thisFileExistsManyTimes.jpg'
folder_of_interest = 'C:/manyDuplicatesHere/'
df = dup.find_duplicates(file_of_interest, folder_of_interest)

Example 3: Directly compare two folders against each other.

import duplicates as dup


folder_of_interest_1 = 'C:/noDuplicatesHere/'
folder_of_interest_2 = 'C:/noDuplicatesHereAsWell/'
df = dup.compare_folders(folder_of_interest_1, folder_of_interest_2)

As in Example 1 above a specific filetype can be selected and the results can be written to a .csv file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Duplicate-Finder-1.2.0.tar.gz (5.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

Duplicate_Finder-1.2.0-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file Duplicate-Finder-1.2.0.tar.gz.

File metadata

  • Download URL: Duplicate-Finder-1.2.0.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.24.0 setuptools/51.1.1 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for Duplicate-Finder-1.2.0.tar.gz
Algorithm Hash digest
SHA256 9896370ad2246edb5269cb833d44e029e4e40e117467689b324190d3d77bc143
MD5 d57876502b1872fc64419a3747f702e9
BLAKE2b-256 aeef421fe227948f9abddc0f11f3945eebafc215336d994fe438376bbc9e6b34

See more details on using hashes here.

File details

Details for the file Duplicate_Finder-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: Duplicate_Finder-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.24.0 setuptools/51.1.1 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for Duplicate_Finder-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 eba58fd7d7f74b0dd164f90758e32e35a050613bb85f4ce1cafd361d4bc13d15
MD5 6b31c3cfabf995598d582c5ffde9c20b
BLAKE2b-256 ea3dd191ed98f566c2772286ce9469b8ad8da1f0d3fa7a9f95a94e2d4ae60b6e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page