Skip to main content

A CLI tool to find and delete duplicate files in a directory.

Project description

TwinTrim

TwinTrim is a powerful and efficient tool designed to find and manage duplicate files across directories. It provides a streamlined way to scan files, identify duplicates based on their content, and remove them automatically or with user guidance, helping you save storage space and keep your file system organized.

Table of Contents

Features

  • Duplicate Detection: Scans directories to detect duplicate files based on file content rather than just filenames.
  • Automatic or Manual Removal: Choose to handle duplicates automatically using the --all flag or manually select which files to delete.
  • Customizable Filters: Set filters for minimum and maximum file sizes, file types, and specific filenames to exclude from the scan.
  • Multi-Threaded Processing: Utilizes multi-threading to quickly scan and process large numbers of files concurrently.
  • Deadlock Prevention: Implements locks to prevent deadlocks during multi-threaded operations, ensuring smooth and safe execution.
  • User-Friendly Interface: Offers clear prompts and feedback via the command line, making the process straightforward and interactive.

How It Works

Core Components

  1. File Metadata Management:

    • Uses AllFileMetadata and FileMetadata classes to manage file information, such as modification time and file paths.
    • Maintains metadata in two dictionaries (store and normalStore) for handling different levels of duplicate management.
  2. File Hashing:

    • Generates a unique hash for each file using MD5 to identify duplicates by content.
  3. File Filtering:

    • The FileFilter class provides functionality to filter files based on size, type, and exclusions.
  4. Duplicate Handling:

    • Duplicate files are identified by comparing their hashes.
    • Based on file modification time, the latest file is retained, and older duplicates are removed.
  5. Deadlock Prevention:

    • Uses locks within multi-threaded processes to ensure that resources are accessed safely, preventing deadlocks that could otherwise halt execution.

Key Functions

  • add_or_update_file: Adds new files to the metadata store or updates existing entries if a duplicate is detected.
  • add_or_update_normal_file: Similar to add_or_update_file but manages duplicates in a separate store.
  • handleAllFlag: Handles duplicate removal automatically without user intervention.
  • find_duplicates: Finds duplicate files in the specified directory and prepares them for user review or automatic handling.

Usage

Command Line Interface

Run the script using the following command:

python -m twinTrim.main <directory> [OPTIONS]

Options

  • --all: Automatically delete duplicates without asking for confirmation.
  • --min-size: Specify the minimum file size to include in the scan (e.g., 10kb).
  • --max-size: Specify the maximum file size to include in the scan (e.g., 1gb).
  • --file-type: Specify the file type to include (e.g., .txt, .jpg).
  • --exclude: Exclude specific files by name.
  • --label-color: Set the font color of the output label of the progress bar.
  • --bar-color: Set the color of the progress bar.

Examples

  1. Automatic Duplicate Removal:

    python -m twinTrim.main /path/to/directory --all
    
  2. Manual Review and Removal:

    python -m twinTrim.main /path/to/directory
    
  3. Filtered Scan by File Size and Type:

    python -m twinTrim.main /path/to/directory --min-size "50kb" --max-size "500mb" --file-type "txt"
    

Dependencies

  • Python 3.6+
  • click for command-line interaction
  • tqdm for progress bars
  • concurrent.futures for multi-threaded processing
  • beaupy for interactive selection

Installation

From PyPI

Install the latest release from PyPI using pip:

pip install twinTrim

You can find the project on PyPI.

Setup for Development

Clone the repository and install the required dependencies using Poetry:

git clone https://github.com/Kota-Karthik/twinTrim.git
cd twinTrim
poetry install
poetry shell

If you haven't installed Poetry yet, you can do so by following the instructions on the Poetry website.

Contributing

Contributions are welcome! Whether you have ideas for improving the internal workings of TwinTrim, such as optimizing performance or refining algorithms, or you want to enhance the user interface of the CLI tool for a better user experience, your input is valuable. Please fork the repository and submit a pull request with your improvements or new features.

Please refer to the CONTRIBUTION_GUIDELINES.md for guidelines on how to contribute.

Code of Conduct

We value and prioritize creating a positive, welcoming, and inclusive environment for everyone involved in the TwinTrim project. We encourage all participants to be respectful, collaborative, and supportive of each other.

Please take a moment to review our Code of Conduct to understand the expected behavior when contributing to the project.

By participating in TwinTrim, you agree to abide by these guidelines and help us maintain a healthy, harassment-free community.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

twintrim-0.1.2.tar.gz (17.7 kB view details)

Uploaded Source

Built Distribution

twinTrim-0.1.2-py3-none-any.whl (18.5 kB view details)

Uploaded Python 3

File details

Details for the file twintrim-0.1.2.tar.gz.

File metadata

  • Download URL: twintrim-0.1.2.tar.gz
  • Upload date:
  • Size: 17.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for twintrim-0.1.2.tar.gz
Algorithm Hash digest
SHA256 0ed7cb6d09bb90a41d9e57ced6ce1c751889b1b11bde08af50e832e187b4e450
MD5 9d22dd8eb23160acfd59f22e6d8b7edb
BLAKE2b-256 4805c8bdf46a6f598aa25d97e5e442eb15ba218bcfcf1c8bafc2b2d158ab5472

See more details on using hashes here.

File details

Details for the file twinTrim-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: twinTrim-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 18.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for twinTrim-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1d2c1f5a2c3b22a1f1b28cef5665344f1cfec585868466db3088ba4df8e3cd6e
MD5 08e8fd77277b410a06270263abd4fd6d
BLAKE2b-256 6e0eede2b16918f5a067fffaf7517c986d53e2876f4d8c033d6dfb469e70d52a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page