Recursively walk into directories and archives
Project description
Recursively Walk Into Directories and Archives
This module primarily provides the function unzipwalk()
, which recursively walks
into directories and compressed files and returns all files, directories, etc. found,
together with binary file handles (file objects) for reading the files.
Currently supported are ZIP, tar, tgz, and gz compressed files.
File types are detected based on their extensions.
The yielded file handles can be wrapped in io.TextIOWrapper
to read them as text files.
For example, to read all CSV files in the current directory and below, including within compressed files:
>>> from unzipwalk import unzipwalk, FileType
>>> from io import TextIOWrapper
>>> import csv
>>> for result in unzipwalk('.'):
... if result.typ==FileType.FILE and result.names[-1].suffix.lower() == '.csv':
... print(repr(result.names))
... with TextIOWrapper(result.hnd, encoding='UTF-8', newline='') as handle:
... csv_rd = csv.reader(handle, strict=True)
... for row in csv_rd:
... print(repr(row))
(...)
[...]
Members
unzipwalk.unzipwalk(paths: str | PathLike | bytes | Iterable[str | PathLike | bytes])
This generator recursively walks into directories and compressed files and yields named tuples of type UnzipWalkResult
.
- Parameters: paths – A filename or iterable of filenames.
class unzipwalk.UnzipWalkResult(names: tuple[PurePath, ...], typ: FileType, hnd: ReadOnlyBinary | None = None)
Return type for unzipwalk()
.
names : tuple[PurePath, ...]
A tuple of the filename(s) as pathlib
objects. The first element is always the physical file in the file system.
If the tuple has more than one element, then the yielded file is contained in a compressed file, possibly nested in
other compressed file(s), and the last element of the tuple will contain the file’s actual name.
typ : FileType
A FileType
value representing the type of the current file.
hnd : ReadOnlyBinary | None
When typ
is FileType.FILE
, this is a ReadOnlyBinary
file handle (file object)
for reading the file contents in binary mode. Otherwise, this is None
.
validate()
Validate whether the object’s fields are set properly and throw errors if not.
Intended for internal use, mainly when type checkers are not being used.
unzipwalk()
validates all the results it returns.
- Returns: The object itself, for method chaining.
class unzipwalk.ReadOnlyBinary(*args, **kwargs)
Interface for the file handle (file object) used in UnzipWalkResult
.
The interface is the intersection of typing.BinaryIO
, gzip.GzipFile
, and zipfile.ZipExtFile
.
Because gzip.GzipFile
doesn’t implement .tell()
, that method isn’t available here.
Whether the handle supports seeking depends on the underlying library.
Note unzipwalk()
automatically closes files.
property name : str
close()
property closed : bool
readable()
read(n: int = -1)
readline(limit: int = -1)
seekable()
seek(offset: int, whence: int = 0)
class unzipwalk.FileType(value)
Used in UnzipWalkResult
to indicate the type of the file.
FILE = 0
A regular file.
ARCHIVE = 1
An archive file, will be descended into.
DIR = 2
A directory.
SYMLINK = 3
A symbolic link.
OTHER = 4
Some other file type (e.g. FIFO).
Command-Line Interface
usage: unzipwalk [-h] [-a] [-d | -c ALGO] [PATH ...]
Recursively walk into directories and archives
positional arguments:
PATH paths to process (default is current directory)
optional arguments:
-h, --help show this help message and exit
-a, --all-files also list dirs, symlinks, etc.
-d, --dump also dump file contents
-c ALGO, --checksum ALGO
generate a checksum for each file
Possible values for ALGO: blake2b, blake2s, md5, md5-sha1, sha1, sha224,
sha256, sha384, sha3_224, sha3_256, sha3_384, sha3_512, sha512, sha512_224,
sha512_256, shake_128, shake_256, sm3
The available checksum algorithms may vary depending on your system and Python version.
Run the command with --help
to see the list of currently available algorithms.
Author, Copyright, and License
Copyright (c) 2022-2024 Hauke Dämpfling (haukex@zero-g.net) at the Leibniz Institute of Freshwater Ecology and Inland Fisheries (IGB), Berlin, Germany, https://www.igb-berlin.de/
This library is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this program. If not, see https://www.gnu.org/licenses/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.