Find and process files in a Pythonic way, without boilerplate code.
Project description
iterfiles
Find and process files in a Pythonic way, without boilerplate code. Implements for_each_file and other common scenarios.
>>> from iterfiles import for_each_file
>>> for_each_file('example', print, pattern='*/*.txt')
This will print all *.txt file names in all first-level subdirectories of example.
Let’s say we have following directory structure:
example/
shapes.txt
aa/
colors.dat # not a txt!
numbers.txt
pets.txt
bb/
names.txt
cc/
cars.txt
The output will be:
example/aa/numbers.txt
example/aa/pets.txt
example/bb/names.txt
Filter directories and files via glob()
All syntax of pathlib.Path.glob is supported.
Print all *.txt files in all subdirectories:
>>> for_each_file('example', print, pattern='**/*.txt')
example/shapes.txt
example/aa/numbers.txt
example/aa/pets.txt
example/bb/names.txt
example/bb/cc/cars.txt
Print all *.txt files only in a top-level directory:
>>> for_each_file('example', print, pattern='*.txt')
example/shapes.txt
Files as an iterable
Iterate over pathlib.Path objects:
>>> from iterfiles import iter_files
>>> [x.name for x in iter_files('example', '**/*.txt')]
['shapes.txt', 'numbers.txt', 'pets.txt', 'names.txt', 'cars.txt']
…or over text file contents directly, for example combine the first words from each file:
>>> from iterfiles import iter_texts
>>> ', '.join(x.split(' ')[0] for x in iter_texts('example', pattern='**/*.txt'))
'Square, One, Cat, Alice, Toyota'
Pasting all files together into corpus
Use for_each_text to work with plain text contents directly.
>>> with open('corpus.txt', 'w') as corpus:
... for_each_text('example', corpus.write, pattern='**/*.txt')
Convert files from one directory to another directory
Let’s say you want to extract OCR text from a large collection of *.pdf into *.txt files.
You have a wonderful function pdftotext(pdf_filename, txt_filename) from another package, it does the job well, but how to apply it to a nested directory tree?
>>> from iterfiles import convert_files
>>> convert_files('input_pdfs', 'output_txts', pdftotext, pattern='**/*.pdf', rename=lambda p: p.with_suffix('.txt'))
That’s all. You’ll have the same directory structure in output, and same file names, but with *.txt suffix instead of *.pdf.
Of course, convert_files can be used for any kind of conversion.
Convert text files
If both input and output is plain text, use convert_texts and forget about reading and writing files. For example, here’s a snippet which transforms all files into uppercase:
>>> from iterfiles import convert_texts
>>> convert_texts('example', 'output', str.upper, pattern='**/*.txt')
Gotchas and Limitations
Any unhandled exception raised from your function will break the loop. Make sure to suppress exceptions which are tolerable. Error handling (such as logging) is out of scope of this package.
Collecting list of files according to glob happens (almost) instantly before any processing takes place. If you add files to directory during long processing, these new files will not be detected on the fly. If you remove files during processing and before they had a chance to be processed, you will see an error.
Only files are considered. Directories are traversed in a search for files; and during conversion, directories are created when necessary; but that’s it. You can’t do anything custom with directories.
Package was not tested with symlinks, and behavior with symlinks is undefined.
Requirements
Python 3.6+
No dependencies
History
0.1.0 (2021-02-02)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file iterfiles-0.1.0.tar.gz
.
File metadata
- Download URL: iterfiles-0.1.0.tar.gz
- Upload date:
- Size: 7.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | aed4f3ff614f13887ce623168c58ebd7e7d664d1b73bf8817fb3944d307b5665 |
|
MD5 | ffe15b220e3df6a79a0b590d6a0efb80 |
|
BLAKE2b-256 | bf31327174fc95bead497a4ddd29bc9b77989d4c9041e14972a3cc3d0ed2556e |
File details
Details for the file iterfiles-0.1.0-py2.py3-none-any.whl
.
File metadata
- Download URL: iterfiles-0.1.0-py2.py3-none-any.whl
- Upload date:
- Size: 5.9 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f4ff293e75982b7063cb5a97f1bec7d79477297bb3a3b222ca2919ed03b2b76d |
|
MD5 | 7d2891239ee8a83a2efc8dae72979369 |
|
BLAKE2b-256 | cac0318d3248847dcc61768066b1d93509ee9af7acad498f01dfcc558afb134a |