Python Wrapper for Hadoop Java API
Project description
Hadoop FileSystem Java Class Wrapper
Typed Python wrappers for Hadoop FileSystem class family.
Installation
You can install this package from pypi
on any Hadoop or Spark runtime:
pip install hadoop-fs-wrapper
Select a version that matches hadoop version you are using:
Hadoop Version | Compatible hadoop-fs-wrapper version |
---|---|
3.2.x | 0.4.x |
3.3.x | 0.4.x |
Usage
Common use case is accessing Hadoop FileSystem from Spark session object:
from hadoop_fs_wrapper.wrappers.file_system import FileSystem
file_system = FileSystem.from_spark_session(spark=spark_session)
Then, for example, one can check if there are any files under specified path:
def is_valid_source_path(file_system: FileSystem, path: str) -> bool:
"""
Checks whether a regexp path refers to a valid set of paths
:param file_system: pyHadooopWrapper FileSystem
:param path: path e.g. (s3a|abfss|file|...)://hello@world.com/path/part*.csv
:return: true if path resolves to existing paths, otherwise false
"""
return len(file_system.glob_status(path)) > 0
Contribution
Currently basic filesystem operations (listing, deleting, search, iterative listing etc.) are supported. If an operation you require is not yet wrapped, please open an issue or create a PR.
All changes are tested against Spark 3.2 running in local mode.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hadoop-fs-wrapper-0.4.5.tar.gz
(11.7 kB
view hashes)
Built Distribution
Close
Hashes for hadoop_fs_wrapper-0.4.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f937fc2879f4e4e2c7dcfed90b1d53d08e013ba951c28f93197db57d1f63397 |
|
MD5 | c5b030cc6fe1ba108fcd4e4070d81a81 |
|
BLAKE2b-256 | f5427dcfb8649c2efa6bcb79c335491a40aac396e334774fae1c0d4e435bad78 |