Skip to main content

Scan directories, apply ignore rules, and chunk file contents.

Project description

FolderScanner

FolderScanner is a Python package that enables efficient scanning of directory structures, applying ignore rules similar to .gitignore, and chunking file contents for processing. It's designed to handle large datasets and is ideal for pre-processing tasks in data analysis or machine learning pipelines.

Features

  • Recursively scans specified directories.
  • Applies ignore patterns to skip specified files and directories.
  • Chunks file contents and yields them with their paths for efficient processing.

Installation

To install FolderScanner, simply use pip:

pip install git+https://github.com/chigwell/FolderScanner.git

Usage

Import and use FolderScanner in your Python projects as follows:

from folder_scanner import scan_directory

core_folder = '/path/to/your/projects'
ignore_patterns = ['.git', '.dockerignore', '*.log', 'tmp/*']

for file_chunk in scan_directory(core_folder, ignore_patterns):
    print(file_chunk)

Contributing

Contributions are welcome! Please feel free to submit pull requests, report bugs, or suggest features on the GitHub issues page.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

FolderScanner-0.1.0.tar.gz (3.2 kB view details)

Uploaded Source

Built Distribution

FolderScanner-0.1.0-py3-none-any.whl (3.7 kB view details)

Uploaded Python 3

File details

Details for the file FolderScanner-0.1.0.tar.gz.

File metadata

  • Download URL: FolderScanner-0.1.0.tar.gz
  • Upload date:
  • Size: 3.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.9

File hashes

Hashes for FolderScanner-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fae914eaebfbd4978e282334f5dde8bd6cec99503581a3a6d802a63d31350b64
MD5 ad0785135795ff8c6673077dacd0ce83
BLAKE2b-256 93909562237c291e1c7d91251c6b189426aabd1705b36ec871e64a4d405ddd87

See more details on using hashes here.

File details

Details for the file FolderScanner-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for FolderScanner-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 83bc5c8446f8a95bd786d4b37a9d33da7e7e72bfc6c214186f4115fb5e429740
MD5 29b08dc925d18b8b5ad2ec6920d071e1
BLAKE2b-256 e80222e474be7f5d93bca63b5a6fe3e5928e029ed1b8cea69811571f0edb9c18

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page