Scan directories, apply ignore rules, and chunk file contents.
Project description
FolderScanner
FolderScanner
is a Python package that enables efficient scanning of directory structures, applying ignore rules similar to .gitignore
, and chunking file contents for processing. It's designed to handle large datasets and is ideal for pre-processing tasks in data analysis or machine learning pipelines.
Features
- Recursively scans specified directories.
- Applies ignore patterns to skip specified files and directories.
- Chunks file contents and yields them with their paths for efficient processing.
Installation
To install FolderScanner
, simply use pip:
pip install git+https://github.com/chigwell/FolderScanner.git
Usage
Import and use FolderScanner
in your Python projects as follows:
from folder_scanner import scan_directory
core_folder = '/path/to/your/projects'
ignore_patterns = ['.git', '.dockerignore', '*.log', 'tmp/*']
for file_chunk in scan_directory(core_folder, ignore_patterns):
print(file_chunk)
Contributing
Contributions are welcome! Please feel free to submit pull requests, report bugs, or suggest features on the GitHub issues page.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for FolderScanner-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 83bc5c8446f8a95bd786d4b37a9d33da7e7e72bfc6c214186f4115fb5e429740 |
|
MD5 | 29b08dc925d18b8b5ad2ec6920d071e1 |
|
BLAKE2b-256 | e80222e474be7f5d93bca63b5a6fe3e5928e029ed1b8cea69811571f0edb9c18 |