file-toolkit is a complete suite of utilities for manipulating, organizing, and monitoring files and directories in Python. It offers functions for copying, moving, synchronizing, compressing, hashing, advanced searching, temporary file management, and much more, with a focus on productivity, security, and auditing.
Project description
file-toolkit - File Operations & Utilities for Python
A robust utility library for managing files, directories, synchronization, compression, monitoring, hashing, and temporary operationsโideal for data pipelines, ETL jobs, and audit-ready file processing workflows.
Features
- ๐๏ธ File and directory operations (copy, move, delete, backup, etc.)
- ๐ฆ ZIP compression and extraction with progress tracking
- ๐ Directory synchronization with conflict resolution
- ๐ Content and metadata search utilities
- ๐งพ JSON, text, and binary file read/write with logging
- ๐ File hashing and duplicate detection
- ๐ Disk usage, file size stats, and empty directory checks
- ๐ก File monitoring with callback support
- ๐งช Temporary file and directory creation
- ๐งพ Built-in progress percentage logger for large files
๐ฆ Installation
Install via pip:
pip install file-toolkit
For development:
git clone https://github.com/ThaissaTeodoro/file-toolkit.git
cd file-toolkit
pip install -e ".[dev]"
๐ Main Modules Overview
| Module | Description |
|-------------------------|--------------------------------------------------------------------------------------|
| `file_ops` | Safe operations: copy, move, delete, rename, backup, read/write. |
| `zip_ops` | Compress and extract files/directories (with validation & progress). |
| `hash_ops` | Generate file hashes and find duplicates. |
| `search_ops` | List, filter and search files by name, content, time or prefix. |
| `stats_ops` | Disk usage, directory size, and file statistics. |
| `sync_ops` | Sync directories with copy/update/delete logic and ignore rules. |
| `monitor_ops` | Watch file changes and trigger callbacks. |
| `temp_file_utils` | Create temporary files and directories. |
| `progress` | Log download/upload progress for large files. |
๐ Quick Start
from file_toolkit import move_file, unzip_file, ProgressPercentage
# Move a file to a destination
move_file("source.csv", "/tmp/")
# Extract a zip file
unzip_file("data.zip", "extracted/")
# Custom progress tracking for large file copy
from file_toolkit import copy_file
progress = ProgressPercentage("bigfile.zip", 1024*1024*500, logger)
copy_file("bigfile.zip", "/dest/", progress_callback=progress)
๐งฉ Key Utilities by Category
-
File Operations (file_ops):
from file_toolkit import write_text_file, read_json_file, backup_file write_text_file("config.txt", "content here") data = read_json_file("settings.json") backup = backup_file("data.csv")
-
Compression (zip_ops):
from file_toolkit import zip_file, unzip_file # Create zip zip_file("folder/", "backup.zip") # Extract zip unzip_file("backup.zip", "output/") )
-
Search & Metadata (search_ops):
from file_toolkit import list_dir_contents, search_file_content files = list_dir_contents("./data", recursive=True) matches = search_file_content("./logs", "error", file_pattern="*.log")
-
Monitoring (monitor_ops):
from file_toolkit import watch_file def on_change(path): print(f"{path} changed!") stop_flag = watch_file("input.csv", on_change, interval=2.0)
-
๐ Sync (sync_ops):
from file_toolkit import sync_directories sync_directories("source/", "target/", delete=True)
-
Hashing (hash_ops):
from file_toolkit import get_file_hash, find_duplicates print(get_file_hash("file.csv")) duplicates = find_duplicates("my-folder/")
-
Stats (stats_ops):
from file_toolkit import check_disk_space, get_largest_files total, used, free = check_disk_space() largest = get_largest_files("/mnt/data")
-
Temporary Files (temp_file_utils):
from file_toolkit import create_temp_file tmp = create_temp_file(content="temp", suffix=".txt")
๐ Best Practices
- Use logger for all file operations for better auditability.
- Use ProgressPercentage when copying/moving large files.
- Use backup_file() before overwriting critical files.
- Always wrap your I/O logic with error-handling utilities provided.
- Use ignore_patterns in sync_directories() to prevent syncing sensitive files.
๐งช Tests
The library has a complete test suite to ensure quality and reliability.
Running the tests:
# Install development dependencies
pip install -e ".[dev]"
# Run all tests
make test
# Tests with coverage
make test-cov
# Specific tests
pytest test/test_file_toolkit.py -v
# Tests with different verbosity levels
pytest test/ -v # Verbose
pytest test/ -s # No output capture
pytest test/ --tb=short # Short traceback
Test Structure
test/
โโโ test_file_ops
โโโ conftest.py # Shared pytest fixtures and test configurations
โโโ Makefile # Automation commands for testing, linting, and build tasks
โโโ pytest.ini # Global pytest configuration settings
โโโ run_tests.py # Script to run all tests automatically
โโโ test-requirements.txt # Development and test dependencies
โโโ TEST_GUIDE.md # Quick guide: how to run and interpret tests
โโโ test_file_ops.py # Automated tests for the file_ops library
โโโ test_hash_ops
โโโ ...
โโโ test_monitor_ops
โโโ ...
โโโ test_progress
โโโ ...
โโโ test_search_ops
โโโ ...
โโโ test_stats_ops
โโโ ...
โโโ test_sync_ops
โโโ ...
โโโ test_temp_file_ops
โโโ ...
โโโ test_zip_ops
โโโ ...
Current coverage
# Coverage report
Name Stmts Miss Cover
-----------------------------------------------
src/logging_metrics/__init__.py 12 0 100%
src/logging_metrics/console.py 45 2 96%
src/logging_metrics/file.py 78 3 96%
src/logging_metrics/spark.py 32 1 97%
src/logging_metrics/timer.py 56 2 96%
src/logging_metrics/metrics.py 89 4 96%
-----------------------------------------------
TOTAL 312 12 96%
Running tests in different environments
# Test in multiple Python versions with tox
pip install tox
tox
# Specific configurations
tox -e py38 # Python 3.8
tox -e py39 # Python 3.9
tox -e py310 # Python 3.10
tox -e py311 # Python 3.11
tox -e py312 # Python 3.12
tox -e lint # Only linting
tox -e coverage # Only coverage
Running tests in CI/CD
Tests are run automatically in:
๐ง Requirements
Python: >= 3.8 Dependencies:
- logging-metrics
- pyspark
๐ Changelog
v0.1.0 โ Initial release
- Initial stable version
- File management core modules
- Sync, search, hashing, compression
- Logging and progress tracking
- Modular and testable design
๐ค Contributing
Contributions are welcome!
- Fork the project
- Create your feature branch (
git checkout -b feature/file-toolkit) - Commit your changes (
git commit -m 'Add file-toolkit') - Push to the branch (
git push origin feature/file-toolkit) - Open a Pull Request
License
MIT License. See LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file file_toolkit-0.1.1.tar.gz.
File metadata
- Download URL: file_toolkit-0.1.1.tar.gz
- Upload date:
- Size: 19.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
10aa5d3e2372e4c0df1c5cc471c062879e16ce672f2a32c9351d95ee18a683a2
|
|
| MD5 |
5558a50762e59ffd0f847cfcaeab786b
|
|
| BLAKE2b-256 |
e87b906c29483781ae40fdec8318b7eb762205eb3ba0b7bcdaa1695de8d30ca4
|
File details
Details for the file file_toolkit-0.1.1-py3-none-any.whl.
File metadata
- Download URL: file_toolkit-0.1.1-py3-none-any.whl
- Upload date:
- Size: 23.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d9a13055650f0f3b190649ed68deba91422a90250ae6c0ec8a2aa679209f1528
|
|
| MD5 |
82d94f34f474d3605dd71372bd0f4d7e
|
|
| BLAKE2b-256 |
30f887c90dc2e591fc4142e0ca491cf9dd8e06b7f8f72d4f189448eb0f78d6c0
|