Skip to main content

file-toolkit is a complete suite of utilities for manipulating, organizing, and monitoring files and directories in Python. It offers functions for copying, moving, synchronizing, compressing, hashing, advanced searching, temporary file management, and much more, with a focus on productivity, security, and auditing.

Project description

file-toolkit - File Operations & Utilities for Python

A robust utility library for managing files, directories, synchronization, compression, monitoring, hashing, and temporary operationsโ€”ideal for data pipelines, ETL jobs, and audit-ready file processing workflows.


Features

  • ๐Ÿ—‚๏ธ File and directory operations (copy, move, delete, backup, etc.)
  • ๐Ÿ“ฆ ZIP compression and extraction with progress tracking
  • ๐Ÿ”„ Directory synchronization with conflict resolution
  • ๐Ÿ” Content and metadata search utilities
  • ๐Ÿงพ JSON, text, and binary file read/write with logging
  • ๐Ÿ” File hashing and duplicate detection
  • ๐Ÿ“ˆ Disk usage, file size stats, and empty directory checks
  • ๐Ÿ“ก File monitoring with callback support
  • ๐Ÿงช Temporary file and directory creation
  • ๐Ÿงพ Built-in progress percentage logger for large files

๐Ÿ“ฆ Installation

Install via pip:

pip install file-toolkit 

For development:

git clone https://github.com/ThaissaTeodoro/file-toolkit.git
cd file-toolkit
pip install -e ".[dev]"

๐Ÿ“‹ Main Modules Overview

| Module                  | Description                                                                          |
|-------------------------|--------------------------------------------------------------------------------------|
| `file_ops`              | Safe operations: copy, move, delete, rename, backup, read/write.                     |
| `zip_ops`               | Compress and extract files/directories (with validation & progress).                 |
| `hash_ops`              | Generate file hashes and find duplicates.                                            |
| `search_ops`            | List, filter and search files by name, content, time or prefix.                      |
| `stats_ops`             | Disk usage, directory size, and file statistics.                                     |
| `sync_ops`              | Sync directories with copy/update/delete logic and ignore rules.                     |
| `monitor_ops`           | Watch file changes and trigger callbacks.                                            |
| `temp_file_utils`       | Create temporary files and directories.                                              |
| `progress`              | Log download/upload progress for large files.                                        |

๐Ÿš€ Quick Start

from file_toolkit import move_file, unzip_file, ProgressPercentage

# Move a file to a destination
move_file("source.csv", "/tmp/")

# Extract a zip file
unzip_file("data.zip", "extracted/")

# Custom progress tracking for large file copy
from file_toolkit import copy_file
progress = ProgressPercentage("bigfile.zip", 1024*1024*500, logger)
copy_file("bigfile.zip", "/dest/", progress_callback=progress)

๐Ÿงฉ Key Utilities by Category

  1. File Operations (file_ops):

    from file_toolkit import write_text_file, read_json_file, backup_file
    
    write_text_file("config.txt", "content here")
    data = read_json_file("settings.json")
    backup = backup_file("data.csv")
    
  2. Compression (zip_ops):

    from file_toolkit import zip_file, unzip_file
    
    # Create zip
    zip_file("folder/", "backup.zip")
    
    # Extract zip
    unzip_file("backup.zip", "output/")  
    )
    
  3. Search & Metadata (search_ops):

    from file_toolkit import list_dir_contents, search_file_content
    
    files = list_dir_contents("./data", recursive=True)
    matches = search_file_content("./logs", "error", file_pattern="*.log")
    
  4. Monitoring (monitor_ops):

    from file_toolkit import watch_file
    
    def on_change(path):
      print(f"{path} changed!")
    
    stop_flag = watch_file("input.csv", on_change, interval=2.0)
    
  5. ๐Ÿ”„ Sync (sync_ops):

    from file_toolkit import sync_directories
    
    sync_directories("source/", "target/", delete=True)
    
  6. Hashing (hash_ops):

    from file_toolkit import get_file_hash, find_duplicates
    
    print(get_file_hash("file.csv"))
    duplicates = find_duplicates("my-folder/")
    
  7. Stats (stats_ops):

    from file_toolkit import check_disk_space, get_largest_files
    
    total, used, free = check_disk_space()
    largest = get_largest_files("/mnt/data")
    
  8. Temporary Files (temp_file_utils):

    from file_toolkit import create_temp_file
    
    tmp = create_temp_file(content="temp", suffix=".txt")
    

๐Ÿ† Best Practices

  • Use logger for all file operations for better auditability.
  • Use ProgressPercentage when copying/moving large files.
  • Use backup_file() before overwriting critical files.
  • Always wrap your I/O logic with error-handling utilities provided.
  • Use ignore_patterns in sync_directories() to prevent syncing sensitive files.

๐Ÿงช Tests

The library has a complete test suite to ensure quality and reliability.

Running the tests:

# Install development dependencies
pip install -e ".[dev]"

# Run all tests
make test

# Tests with coverage
make test-cov

# Specific tests
pytest test/test_file_toolkit.py -v

# Tests with different verbosity levels
pytest test/ -v                     # Verbose
pytest test/ -s                     # No output capture
pytest test/ --tb=short             # Short traceback

Test Structure

test/
โ”œโ”€โ”€ test_file_ops
  โ”œโ”€โ”€ conftest.py                  # Shared pytest fixtures and test configurations        
  โ”œโ”€โ”€ Makefile                     # Automation commands for testing, linting, and build tasks
  โ”œโ”€โ”€ pytest.ini                   # Global pytest configuration settings
  โ”œโ”€โ”€ run_tests.py                 # Script to run all tests automatically
  โ”œโ”€โ”€ test-requirements.txt        # Development and test dependencies
  โ”œโ”€โ”€ TEST_GUIDE.md                # Quick guide: how to run and interpret tests
  โ””โ”€โ”€ test_file_ops.py             # Automated tests for the file_ops library
โ”œโ”€โ”€ test_hash_ops
  โ””โ”€โ”€ ...
โ”œโ”€โ”€ test_monitor_ops
  โ””โ”€โ”€ ...
โ”œโ”€โ”€ test_progress
  โ””โ”€โ”€ ...
โ”œโ”€โ”€ test_search_ops
  โ””โ”€โ”€ ...
โ”œโ”€โ”€ test_stats_ops
  โ””โ”€โ”€ ...
โ”œโ”€โ”€ test_sync_ops
  โ””โ”€โ”€ ...
โ”œโ”€โ”€ test_temp_file_ops
  โ””โ”€โ”€ ...
โ””โ”€โ”€ test_zip_ops
  โ””โ”€โ”€ ...

Current coverage

# Coverage report
Name                        Stmts   Miss  Cover
-----------------------------------------------
src/logging_metrics/__init__.py     12      0   100%
src/logging_metrics/console.py      45      2    96%
src/logging_metrics/file.py         78      3    96%
src/logging_metrics/spark.py        32      1    97%
src/logging_metrics/timer.py        56      2    96%
src/logging_metrics/metrics.py      89      4    96%
-----------------------------------------------
TOTAL                            312     12    96%

Running tests in different environments

# Test in multiple Python versions with tox
pip install tox

tox

# Specific configurations
tox -e py38                # Python 3.8
tox -e py39                # Python 3.9  
tox -e py310               # Python 3.10
tox -e py311               # Python 3.11
tox -e py312               # Python 3.12
tox -e lint                # Only linting
tox -e coverage            # Only coverage

Running tests in CI/CD

Tests are run automatically in:


๐Ÿ”ง Requirements

Python: >= 3.8 Dependencies:

  • logging-metrics
  • pyspark

๐Ÿ“ Changelog

v0.1.0 โ€“ Initial release

  • Initial stable version
  • File management core modules
  • Sync, search, hashing, compression
  • Logging and progress tracking
  • Modular and testable design

๐Ÿค Contributing

Contributions are welcome!

  1. Fork the project
  2. Create your feature branch (git checkout -b feature/file-toolkit)
  3. Commit your changes (git commit -m 'Add file-toolkit')
  4. Push to the branch (git push origin feature/file-toolkit)
  5. Open a Pull Request

License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

file_toolkit-0.1.2.tar.gz (19.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

file_toolkit-0.1.2-py3-none-any.whl (23.1 kB view details)

Uploaded Python 3

File details

Details for the file file_toolkit-0.1.2.tar.gz.

File metadata

  • Download URL: file_toolkit-0.1.2.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for file_toolkit-0.1.2.tar.gz
Algorithm Hash digest
SHA256 e13eae2bfc25e49dc2918444b662586a2df559f508e8a37bff2c69db4976b038
MD5 4afc007ceb897aa73e73a4295be9a555
BLAKE2b-256 3cacc196017069b0e188fb2f1eec783f6ab4d4390470aa40b91c266dd982a4ee

See more details on using hashes here.

File details

Details for the file file_toolkit-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: file_toolkit-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 23.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for file_toolkit-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c36f43d9cd6aa78c78ee11903a082c406e580f90312f3f70d4dd4da547e7471e
MD5 58eb9ba9823b5d9a3507773534a269f5
BLAKE2b-256 907234714fc3ccfa107cc566196a78c00edfb5fed65fab46d92f961c8f24e7d7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page