Skip to main content

file-toolkit is a complete suite of utilities for manipulating, organizing, and monitoring files and directories in Python. It offers functions for copying, moving, synchronizing, compressing, hashing, advanced searching, temporary file management, and much more, with a focus on productivity, security, and auditing.

Project description

file-toolkit - File Operations & Utilities for Python

A robust utility library for managing files, directories, synchronization, compression, monitoring, hashing, and temporary operationsโ€”ideal for data pipelines, ETL jobs, and audit-ready file processing workflows.


Features

  • ๐Ÿ—‚๏ธ File and directory operations (copy, move, delete, backup, etc.)
  • ๐Ÿ“ฆ ZIP compression and extraction with progress tracking
  • ๐Ÿ”„ Directory synchronization with conflict resolution
  • ๐Ÿ” Content and metadata search utilities
  • ๐Ÿงพ JSON, text, and binary file read/write with logging
  • ๐Ÿ” File hashing and duplicate detection
  • ๐Ÿ“ˆ Disk usage, file size stats, and empty directory checks
  • ๐Ÿ“ก File monitoring with callback support
  • ๐Ÿงช Temporary file and directory creation
  • ๐Ÿงพ Built-in progress percentage logger for large files

๐Ÿ“ฆ Installation

Install via pip:

pip install file-toolkit 

For development:

git clone https://github.com/ThaissaTeodoro/file-toolkit.git
cd file-toolkit
pip install -e ".[dev]"

๐Ÿ“‹ Main Modules Overview

| Module                  | Description                                                                          |
|-------------------------|--------------------------------------------------------------------------------------|
| `file_ops`              | Safe operations: copy, move, delete, rename, backup, read/write.                     |
| `zip_ops`               | Compress and extract files/directories (with validation & progress).                 |
| `hash_ops`              | Generate file hashes and find duplicates.                                            |
| `search_ops`            | List, filter and search files by name, content, time or prefix.                      |
| `stats_ops`             | Disk usage, directory size, and file statistics.                                     |
| `sync_ops`              | Sync directories with copy/update/delete logic and ignore rules.                     |
| `monitor_ops`           | Watch file changes and trigger callbacks.                                            |
| `temp_file_utils`       | Create temporary files and directories.                                              |
| `progress`              | Log download/upload progress for large files.                                        |

๐Ÿš€ Quick Start

from file_toolkit import move_file, unzip_file, ProgressPercentage

# Move a file to a destination
move_file("source.csv", "/tmp/")

# Extract a zip file
unzip_file("data.zip", "extracted/")

# Custom progress tracking for large file copy
from file_toolkit import copy_file
progress = ProgressPercentage("bigfile.zip", 1024*1024*500, logger)
copy_file("bigfile.zip", "/dest/", progress_callback=progress)

๐Ÿงฉ Key Utilities by Category

  1. File Operations (file_ops):

    from file_toolkit import write_text_file, read_json_file, backup_file
    
    write_text_file("config.txt", "content here")
    data = read_json_file("settings.json")
    backup = backup_file("data.csv")
    
  2. Compression (zip_ops):

    from file_toolkit import zip_file, unzip_file
    
    # Create zip
    zip_file("folder/", "backup.zip")
    
    # Extract zip
    unzip_file("backup.zip", "output/")  
    )
    
  3. Search & Metadata (search_ops):

    from file_toolkit import list_dir_contents, search_file_content
    
    files = list_dir_contents("./data", recursive=True)
    matches = search_file_content("./logs", "error", file_pattern="*.log")
    
  4. Monitoring (monitor_ops):

    from file_toolkit import watch_file
    
    def on_change(path):
      print(f"{path} changed!")
    
    stop_flag = watch_file("input.csv", on_change, interval=2.0)
    
  5. ๐Ÿ”„ Sync (sync_ops):

    from file_toolkit import sync_directories
    
    sync_directories("source/", "target/", delete=True)
    
  6. Hashing (hash_ops):

    from file_toolkit import get_file_hash, find_duplicates
    
    print(get_file_hash("file.csv"))
    duplicates = find_duplicates("my-folder/")
    
  7. Stats (stats_ops):

    from file_toolkit import check_disk_space, get_largest_files
    
    total, used, free = check_disk_space()
    largest = get_largest_files("/mnt/data")
    
  8. Temporary Files (temp_file_utils):

    from file_toolkit import create_temp_file
    
    tmp = create_temp_file(content="temp", suffix=".txt")
    

๐Ÿ† Best Practices

  • Use logger for all file operations for better auditability.
  • Use ProgressPercentage when copying/moving large files.
  • Use backup_file() before overwriting critical files.
  • Always wrap your I/O logic with error-handling utilities provided.
  • Use ignore_patterns in sync_directories() to prevent syncing sensitive files.

๐Ÿงช Tests

The library has a complete test suite to ensure quality and reliability.

Running the tests:

# Install development dependencies
pip install -e ".[dev]"

# Run all tests
make test

# Tests with coverage
make test-cov

# Specific tests
pytest test/test_file_toolkit.py -v

# Tests with different verbosity levels
pytest test/ -v                     # Verbose
pytest test/ -s                     # No output capture
pytest test/ --tb=short             # Short traceback

Test Structure

test/
โ”œโ”€โ”€ test_file_ops
  โ”œโ”€โ”€ conftest.py                  # Shared pytest fixtures and test configurations        
  โ”œโ”€โ”€ Makefile                     # Automation commands for testing, linting, and build tasks
  โ”œโ”€โ”€ pytest.ini                   # Global pytest configuration settings
  โ”œโ”€โ”€ run_tests.py                 # Script to run all tests automatically
  โ”œโ”€โ”€ test-requirements.txt        # Development and test dependencies
  โ”œโ”€โ”€ TEST_GUIDE.md                # Quick guide: how to run and interpret tests
  โ””โ”€โ”€ test_file_ops.py             # Automated tests for the file_ops library
โ”œโ”€โ”€ test_hash_ops
  โ””โ”€โ”€ ...
โ”œโ”€โ”€ test_monitor_ops
  โ””โ”€โ”€ ...
โ”œโ”€โ”€ test_progress
  โ””โ”€โ”€ ...
โ”œโ”€โ”€ test_search_ops
  โ””โ”€โ”€ ...
โ”œโ”€โ”€ test_stats_ops
  โ””โ”€โ”€ ...
โ”œโ”€โ”€ test_sync_ops
  โ””โ”€โ”€ ...
โ”œโ”€โ”€ test_temp_file_ops
  โ””โ”€โ”€ ...
โ””โ”€โ”€ test_zip_ops
  โ””โ”€โ”€ ...

Current coverage

# Coverage report
Name                        Stmts   Miss  Cover
-----------------------------------------------
src/logging_metrics/__init__.py     12      0   100%
src/logging_metrics/console.py      45      2    96%
src/logging_metrics/file.py         78      3    96%
src/logging_metrics/spark.py        32      1    97%
src/logging_metrics/timer.py        56      2    96%
src/logging_metrics/metrics.py      89      4    96%
-----------------------------------------------
TOTAL                            312     12    96%

Running tests in different environments

# Test in multiple Python versions with tox
pip install tox

tox

# Specific configurations
tox -e py38                # Python 3.8
tox -e py39                # Python 3.9  
tox -e py310               # Python 3.10
tox -e py311               # Python 3.11
tox -e py312               # Python 3.12
tox -e lint                # Only linting
tox -e coverage            # Only coverage

Running tests in CI/CD

Tests are run automatically in:


๐Ÿ”ง Requirements

Python: >= 3.8 Dependencies:

  • logging-metrics
  • pyspark

๐Ÿ“ Changelog

v0.1.0 โ€“ Initial release

  • Initial stable version
  • File management core modules
  • Sync, search, hashing, compression
  • Logging and progress tracking
  • Modular and testable design

๐Ÿค Contributing

Contributions are welcome!

  1. Fork the project
  2. Create your feature branch (git checkout -b feature/file-toolkit)
  3. Commit your changes (git commit -m 'Add file-toolkit')
  4. Push to the branch (git push origin feature/file-toolkit)
  5. Open a Pull Request

License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

file_toolkit-0.1.1.tar.gz (19.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

file_toolkit-0.1.1-py3-none-any.whl (23.1 kB view details)

Uploaded Python 3

File details

Details for the file file_toolkit-0.1.1.tar.gz.

File metadata

  • Download URL: file_toolkit-0.1.1.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for file_toolkit-0.1.1.tar.gz
Algorithm Hash digest
SHA256 10aa5d3e2372e4c0df1c5cc471c062879e16ce672f2a32c9351d95ee18a683a2
MD5 5558a50762e59ffd0f847cfcaeab786b
BLAKE2b-256 e87b906c29483781ae40fdec8318b7eb762205eb3ba0b7bcdaa1695de8d30ca4

See more details on using hashes here.

File details

Details for the file file_toolkit-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: file_toolkit-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 23.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for file_toolkit-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d9a13055650f0f3b190649ed68deba91422a90250ae6c0ec8a2aa679209f1528
MD5 82d94f34f474d3605dd71372bd0f4d7e
BLAKE2b-256 30f887c90dc2e591fc4142e0ca491cf9dd8e06b7f8f72d4f189448eb0f78d6c0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page