Skip to main content

Scan directories, apply ignore rules, and chunk file contents.

Project description

PyPI version License: MIT Downloads LinkedIn

FolderScanner

FolderScanner is a Python package that enables efficient scanning of directory structures, applying ignore rules similar to .gitignore, and chunking file contents for processing. It's designed to handle large datasets and is ideal for pre-processing tasks in data analysis or machine learning pipelines.

Features

  • Recursively scans specified directories.
  • Applies ignore patterns to skip specified files and directories.
  • Chunks file contents and yields them with their paths for efficient processing.

Installation

To install FolderScanner, simply use pip:

pip install FolderScanner

Usage

Import and use FolderScanner in your Python projects as follows:

from folder_scanner import scan_directory

core_folder = '/path/to/your/projects'
ignore_patterns = ['.git', '.dockerignore', '*.log', 'tmp/*']

for file_chunk in scan_directory(core_folder, ignore_patterns):
    print(file_chunk)

Contributing

Contributions are welcome! Please feel free to submit pull requests, report bugs, or suggest features on the GitHub issues page.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

folderscanner-2025.4.231612.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

folderscanner-2025.4.231612-py3-none-any.whl (4.1 kB view details)

Uploaded Python 3

File details

Details for the file folderscanner-2025.4.231612.tar.gz.

File metadata

  • Download URL: folderscanner-2025.4.231612.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.11

File hashes

Hashes for folderscanner-2025.4.231612.tar.gz
Algorithm Hash digest
SHA256 c2bfcde2639eebe018f8b64911f6d581f41f4ec1dcf0e97e1b52c24f741f6eec
MD5 10c7c94416b56c0b2ff89b70f284c1cd
BLAKE2b-256 fdbe7bffbebb84be5290cf5b6845deafae5f0a9c1fea054557c0d23fb23aa797

See more details on using hashes here.

File details

Details for the file folderscanner-2025.4.231612-py3-none-any.whl.

File metadata

File hashes

Hashes for folderscanner-2025.4.231612-py3-none-any.whl
Algorithm Hash digest
SHA256 14c059e695a4fbf950e617b4d299ac0bb14dfa776b1f406020af4149de5ce752
MD5 4605fe68de43e153d2f2976b01a1a34e
BLAKE2b-256 baadc0d258b8d8a3709aaa024fd7c600e0afbb3468209975e73293acf6470ef1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page