Scan directories, apply ignore rules, and chunk file contents.
Project description
FolderScanner
FolderScanner is a Python package that enables efficient scanning of directory structures, applying ignore rules similar to .gitignore, and chunking file contents for processing. It's designed to handle large datasets and is ideal for pre-processing tasks in data analysis or machine learning pipelines.
Features
- Recursively scans specified directories.
- Applies ignore patterns to skip specified files and directories.
- Chunks file contents and yields them with their paths for efficient processing.
Installation
To install FolderScanner, simply use pip:
pip install FolderScanner
Usage
Import and use FolderScanner in your Python projects as follows:
from folder_scanner import scan_directory
core_folder = '/path/to/your/projects'
ignore_patterns = ['.git', '.dockerignore', '*.log', 'tmp/*']
for file_chunk in scan_directory(core_folder, ignore_patterns):
print(file_chunk)
Contributing
Contributions are welcome! Please feel free to submit pull requests, report bugs, or suggest features on the GitHub issues page.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file folderscanner-2025.4.231612.tar.gz.
File metadata
- Download URL: folderscanner-2025.4.231612.tar.gz
- Upload date:
- Size: 3.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2bfcde2639eebe018f8b64911f6d581f41f4ec1dcf0e97e1b52c24f741f6eec
|
|
| MD5 |
10c7c94416b56c0b2ff89b70f284c1cd
|
|
| BLAKE2b-256 |
fdbe7bffbebb84be5290cf5b6845deafae5f0a9c1fea054557c0d23fb23aa797
|
File details
Details for the file folderscanner-2025.4.231612-py3-none-any.whl.
File metadata
- Download URL: folderscanner-2025.4.231612-py3-none-any.whl
- Upload date:
- Size: 4.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
14c059e695a4fbf950e617b4d299ac0bb14dfa776b1f406020af4149de5ce752
|
|
| MD5 |
4605fe68de43e153d2f2976b01a1a34e
|
|
| BLAKE2b-256 |
baadc0d258b8d8a3709aaa024fd7c600e0afbb3468209975e73293acf6470ef1
|