Skip to main content

Scan directories, apply ignore rules, and chunk file contents.

Project description

FolderScanner

FolderScanner is a Python package that enables efficient scanning of directory structures, applying ignore rules similar to .gitignore, and chunking file contents for processing. It's designed to handle large datasets and is ideal for pre-processing tasks in data analysis or machine learning pipelines.

Features

  • Recursively scans specified directories.
  • Applies ignore patterns to skip specified files and directories.
  • Chunks file contents and yields them with their paths for efficient processing.

Installation

To install FolderScanner, simply use pip:

pip install git+https://github.com/chigwell/FolderScanner.git

Usage

Import and use FolderScanner in your Python projects as follows:

from folder_scanner import scan_directory

core_folder = '/path/to/your/projects'
ignore_patterns = ['.git', '.dockerignore', '*.log', 'tmp/*']

for file_chunk in scan_directory(core_folder, ignore_patterns):
    print(file_chunk)

Contributing

Contributions are welcome! Please feel free to submit pull requests, report bugs, or suggest features on the GitHub issues page.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

FolderScanner-0.1.0.tar.gz (3.2 kB view hashes)

Uploaded Source

Built Distribution

FolderScanner-0.1.0-py3-none-any.whl (3.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page