Python Wrapper for Hadoop Java API
Project description
Hadoop FileSystem Java Class Wrapper
Typed Python wrappers for Hadoop FileSystem class family.
Installation
You can install this package from pypi
on any Hadoop or Spark runtime:
pip install hadoop-fs-wrapper
Select a version that matches hadoop version you are using:
Hadoop Version / Spark version | Compatible hadoop-fs-wrapper version |
---|---|
3.2.x / 3.2.x | 0.4.x |
3.3.x / 3.3.x | 0.4.x, 0.5.x |
3.3.x / 3.4.x | 0.6.x |
3.5.x / 3.5.x | 0.7.x |
Usage
Common use case is accessing Hadoop FileSystem from Spark session object:
from hadoop_fs_wrapper.wrappers.file_system import FileSystem
file_system = FileSystem.from_spark_session(spark=spark_session)
Then, for example, one can check if there are any files under specified path:
from hadoop_fs_wrapper.wrappers.file_system import FileSystem
def is_valid_source_path(file_system: FileSystem, path: str) -> bool:
"""
Checks whether a regexp path refers to a valid set of paths
:param file_system: pyHadooopWrapper FileSystem
:param path: path e.g. (s3a|abfss|file|...)://hello@world.com/path/part*.csv
:return: true if path resolves to existing paths, otherwise false
"""
return len(file_system.glob_status(path)) > 0
Contribution
Currently basic filesystem operations (listing, deleting, search, iterative listing etc.) are supported. If an operation you require is not yet wrapped, please open an issue or create a PR.
All changes are tested against Spark 3.4 running in local mode.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hadoop_fs_wrapper-0.7.1.tar.gz
.
File metadata
- Download URL: hadoop_fs_wrapper-0.7.1.tar.gz
- Upload date:
- Size: 9.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.11.10 Linux/6.5.0-1025-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d71f26974e9c81de0550003ac2a99fca3cc670bb52ddf48ecf432e43f434c89d |
|
MD5 | 057ff80192d0b39dca46ababc7532723 |
|
BLAKE2b-256 | d04f8790e7eafb2595df66ee7b6b926c71bc0dbd7a4b49cf93dc3b187edb1c4e |
File details
Details for the file hadoop_fs_wrapper-0.7.1-py3-none-any.whl
.
File metadata
- Download URL: hadoop_fs_wrapper-0.7.1-py3-none-any.whl
- Upload date:
- Size: 24.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.11.10 Linux/6.5.0-1025-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a3b66a02aa0d6af9375471b728313e26ed376371fe0d0f2af93fd7a8fe8752d3 |
|
MD5 | 5b7d918200c95a9faf30a3f38125091d |
|
BLAKE2b-256 | 39a99c388bc872598a029d2f72f2056b6007220657a2beadace732c910a2780b |