Package to find duplicate files in and across folders
Project description
About Duplicates Finder
Duplicates Finder is a simple Python package that identifies duplicate files in and across folders. There are three ways to search for identical files:
- List all duplicate files in a folder of interest.
- Pick a file and find all duplications in a folder.
- Directly compare two folders against each other.
The results are saved as a Pandas Dataframe or can be exported as .csv files. More information about the underlying concept can also be found in this short article.
Installation
You can either clone the repository directly from the Github webpage or run the following command(s) in your terminal:
Pip Installation:
pip install duplicate-finder
Alternatively you can clone the Git repository:
git clone https://github.com/akcarsten/duplicates.git
Then go to the folder to which you cloned the repository and run:
python setup.py install
Now you can run Python and import the Bitfinex client.
Examples of how to use the package
Example 1: List all duplicate files in a folder of interest.
import duplicates as dup
folder_of_interest = 'C:/manyDuplicatesHere/'
dup.list_all_duplicates(folder_of_interest, to_csv=True, csv_path='C:/csvWithAllDuplicates/', fastscan=True)
Here the fastscan parameter is set to True (default is false). By doing so a pre-selection of potential duplicate files is performed based on the file size. If only a specific type of files is of interest this can be further defined by the 'ext' parameter. For example:
df = dup.list_all_duplicates(folder_of_interest, to_csv=True, csv_path='C:/csvWithAllDuplicates/', ext='.jpg')
Example 2: Pick a file and find all duplications in a folder.
import duplicates as dup
file_of_interest = 'C:/manyDuplicatesHere/thisFileExistsManyTimes.jpg'
folder_of_interest = 'C:/manyDuplicatesHere/'
df = dup.find_duplicates(file_of_interest, folder_of_interest)
Example 3: Directly compare two folders against each other.
import duplicates as dup
folder_of_interest_1 = 'C:/noDuplicatesHere/'
folder_of_interest_2 = 'C:/noDuplicatesHereAsWell/'
df = dup.compare_folders(folder_of_interest_1, folder_of_interest_2)
As in Example 1 above a specific filetype can be selected and the results can be written to a .csv file.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file Duplicate-Finder-1.4.0.tar.gz
.
File metadata
- Download URL: Duplicate-Finder-1.4.0.tar.gz
- Upload date:
- Size: 5.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.24.0 setuptools/51.1.1 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ef0c55899b4f9ccb8f5e7a14731200b5a902ffbfc024bf2b19bbb1cd9393b6f |
|
MD5 | 8afc55275d310429cbafc93fcdecd4c4 |
|
BLAKE2b-256 | 14f64fdabcc153d20589de6cefe0d94d301aecb3229f52a76bbae8fb7d000282 |
File details
Details for the file Duplicate_Finder-1.4.0-py3-none-any.whl
.
File metadata
- Download URL: Duplicate_Finder-1.4.0-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.24.0 setuptools/51.1.1 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d3736cda6fcdc48aecf28186d8e3923b7cfd19e7e410677292b74a19713d7be7 |
|
MD5 | 4ba5077f885a7960ce2b5fcbe4285226 |
|
BLAKE2b-256 | 40cdda0dc217d274493a64507aabecf433a2053eb22e34eb3285f0ddf23cc179 |