Skip to main content

file-toolkit is a complete suite of utilities for manipulating, organizing, and monitoring files and directories in Python. It offers functions for copying, moving, synchronizing, compressing, hashing, advanced searching, temporary file management, and much more, with a focus on productivity, security, and auditing.

Project description

file-toolkit - File Operations & Utilities for Python

A robust utility library for managing files, directories, synchronization, compression, monitoring, hashing, and temporary operationsโ€”ideal for data pipelines, ETL jobs, and audit-ready file processing workflows.


Features

  • ๐Ÿ—‚๏ธ File and directory operations (copy, move, delete, backup, etc.)
  • ๐Ÿ“ฆ ZIP compression and extraction with progress tracking
  • ๐Ÿ”„ Directory synchronization with conflict resolution
  • ๐Ÿ” Content and metadata search utilities
  • ๐Ÿงพ JSON, text, and binary file read/write with logging
  • ๐Ÿ” File hashing and duplicate detection
  • ๐Ÿ“ˆ Disk usage, file size stats, and empty directory checks
  • ๐Ÿ“ก File monitoring with callback support
  • ๐Ÿงช Temporary file and directory creation
  • ๐Ÿงพ Built-in progress percentage logger for large files

๐Ÿ“ฆ Installation

Install via pip:

pip install file-toolkit 

For development:

git clone https://github.com/ThaissaTeodoro/file-toolkit.git
cd file-toolkit
pip install -e ".[dev]"

๐Ÿ“‹ Main Modules Overview

| Module                  | Description                                                                          |
|-------------------------|--------------------------------------------------------------------------------------|
| `file_ops`              | Safe operations: copy, move, delete, rename, backup, read/write.                     |
| `zip_ops`               | Compress and extract files/directories (with validation & progress).                 |
| `hash_ops`              | Generate file hashes and find duplicates.                                            |
| `search_ops`            | List, filter and search files by name, content, time or prefix.                      |
| `stats_ops`             | Disk usage, directory size, and file statistics.                                     |
| `sync_ops`              | Sync directories with copy/update/delete logic and ignore rules.                     |
| `monitor_ops`           | Watch file changes and trigger callbacks.                                            |
| `temp_file_utils`       | Create temporary files and directories.                                              |
| `progress`              | Log download/upload progress for large files.                                        |

๐Ÿš€ Quick Start

from file_toolkit import move_file, unzip_file, ProgressPercentage

# Move a file to a destination
move_file("source.csv", "/tmp/")

# Extract a zip file
unzip_file("data.zip", "extracted/")

# Custom progress tracking for large file copy
from file_toolkit import copy_file
progress = ProgressPercentage("bigfile.zip", 1024*1024*500, logger)
copy_file("bigfile.zip", "/dest/", progress_callback=progress)

๐Ÿงฉ Key Utilities by Category

  1. File Operations (file_ops):

    from file_toolkit import write_text_file, read_json_file, backup_file
    
    write_text_file("config.txt", "content here")
    data = read_json_file("settings.json")
    backup = backup_file("data.csv")
    
  2. Compression (zip_ops):

    from file_toolkit import zip_file, unzip_file
    
    # Create zip
    zip_file("folder/", "backup.zip")
    
    # Extract zip
    unzip_file("backup.zip", "output/")  
    )
    
  3. Search & Metadata (search_ops):

    from file_toolkit import list_dir_contents, search_file_content
    
    files = list_dir_contents("./data", recursive=True)
    matches = search_file_content("./logs", "error", file_pattern="*.log")
    
  4. Monitoring (monitor_ops):

    from file_toolkit import watch_file
    
    def on_change(path):
      print(f"{path} changed!")
    
    stop_flag = watch_file("input.csv", on_change, interval=2.0)
    
  5. ๐Ÿ”„ Sync (sync_ops):

    from file_toolkit import sync_directories
    
    sync_directories("source/", "target/", delete=True)
    
  6. Hashing (hash_ops):

    from file_toolkit import get_file_hash, find_duplicates
    
    print(get_file_hash("file.csv"))
    duplicates = find_duplicates("my-folder/")
    
  7. Stats (stats_ops):

    from file_toolkit import check_disk_space, get_largest_files
    
    total, used, free = check_disk_space()
    largest = get_largest_files("/mnt/data")
    
  8. Temporary Files (temp_file_utils):

    from file_toolkit import create_temp_file
    
    tmp = create_temp_file(content="temp", suffix=".txt")
    

๐Ÿ† Best Practices

  • Use logger for all file operations for better auditability.
  • Use ProgressPercentage when copying/moving large files.
  • Use backup_file() before overwriting critical files.
  • Always wrap your I/O logic with error-handling utilities provided.
  • Use ignore_patterns in sync_directories() to prevent syncing sensitive files.

๐Ÿงช Tests

The library has a complete test suite to ensure quality and reliability.

Running the tests:

# Install development dependencies
pip install -e ".[dev]"

# Run all tests
make test

# Tests with coverage
make test-cov

# Specific tests
pytest test/test_file_toolkit.py -v

# Tests with different verbosity levels
pytest test/ -v                     # Verbose
pytest test/ -s                     # No output capture
pytest test/ --tb=short             # Short traceback

Test Structure

test/
โ”œโ”€โ”€ test_file_ops
  โ”œโ”€โ”€ conftest.py                  # Shared pytest fixtures and test configurations        
  โ”œโ”€โ”€ Makefile                     # Automation commands for testing, linting, and build tasks
  โ”œโ”€โ”€ pytest.ini                   # Global pytest configuration settings
  โ”œโ”€โ”€ run_tests.py                 # Script to run all tests automatically
  โ”œโ”€โ”€ test-requirements.txt        # Development and test dependencies
  โ”œโ”€โ”€ TEST_GUIDE.md                # Quick guide: how to run and interpret tests
  โ””โ”€โ”€ test_file_ops.py             # Automated tests for the file_ops library
โ”œโ”€โ”€ test_hash_ops
  โ””โ”€โ”€ ...
โ”œโ”€โ”€ test_monitor_ops
  โ””โ”€โ”€ ...
โ”œโ”€โ”€ test_progress
  โ””โ”€โ”€ ...
โ”œโ”€โ”€ test_search_ops
  โ””โ”€โ”€ ...
โ”œโ”€โ”€ test_stats_ops
  โ””โ”€โ”€ ...
โ”œโ”€โ”€ test_sync_ops
  โ””โ”€โ”€ ...
โ”œโ”€โ”€ test_temp_file_ops
  โ””โ”€โ”€ ...
โ””โ”€โ”€ test_zip_ops
  โ””โ”€โ”€ ...

Current coverage

# Coverage report
Name                        Stmts   Miss  Cover
-----------------------------------------------
src/logging_metrics/__init__.py     12      0   100%
src/logging_metrics/console.py      45      2    96%
src/logging_metrics/file.py         78      3    96%
src/logging_metrics/spark.py        32      1    97%
src/logging_metrics/timer.py        56      2    96%
src/logging_metrics/metrics.py      89      4    96%
-----------------------------------------------
TOTAL                            312     12    96%

Running tests in different environments

# Test in multiple Python versions with tox
pip install tox

tox

# Specific configurations
tox -e py38                # Python 3.8
tox -e py39                # Python 3.9  
tox -e py310               # Python 3.10
tox -e py311               # Python 3.11
tox -e py312               # Python 3.12
tox -e lint                # Only linting
tox -e coverage            # Only coverage

Running tests in CI/CD

Tests are run automatically in:


๐Ÿ”ง Requirements

Python: >= 3.8 Dependencies:

  • logging-metrics
  • pyspark

๐Ÿ“ Changelog

v0.1.0 โ€“ Initial release

  • Initial stable version
  • File management core modules
  • Sync, search, hashing, compression
  • Logging and progress tracking
  • Modular and testable design

๐Ÿค Contributing

Contributions are welcome!

  1. Fork the project
  2. Create your feature branch (git checkout -b feature/file-toolkit)
  3. Commit your changes (git commit -m 'Add file-toolkit')
  4. Push to the branch (git push origin feature/file-toolkit)
  5. Open a Pull Request

License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

file_toolkit-0.1.0.tar.gz (19.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

file_toolkit-0.1.0-py3-none-any.whl (23.1 kB view details)

Uploaded Python 3

File details

Details for the file file_toolkit-0.1.0.tar.gz.

File metadata

  • Download URL: file_toolkit-0.1.0.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for file_toolkit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9af77b664246db71137e424a28fb159055d7733bb2455529032373a84aadd00d
MD5 8f774bb5aa9950e9a9e881b5b5446eaf
BLAKE2b-256 1c39bf5d6fcf93486f24b232a535addadef884e592b0bcb2dc2990638fb80307

See more details on using hashes here.

File details

Details for the file file_toolkit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: file_toolkit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 23.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for file_toolkit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 92d249cb960800f57d6e2e855d98d8936024a673388fbcafcf9dd165af648326
MD5 ef9270831e1c72f48569d339f9486fcf
BLAKE2b-256 d46815c4bacb84d70899dd513f262e680dab451504421ae9bb7b340b3e2aa970

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page