Skip to main content

Python Wrapper for Hadoop Java API

Project description

Hadoop FileSystem Java Class Wrapper

Typed Python wrappers for Hadoop FileSystem class family.

Installation

You can install this package from pypi on any Hadoop or Spark runtime:

pip install hadoop-fs-wrapper

Select a version that matches hadoop version you are using:

Hadoop Version Compatible hadoop-fs-wrapper version
3.2.x 0.4.x
3.3.x 0.4.x

Usage

Common use case is accessing Hadoop FileSystem from Spark session object:

from hadoop_fs_wrapper.wrappers.file_system import FileSystem

file_system = FileSystem.from_spark_session(spark=spark_session)

Then, for example, one can check if there are any files under specified path:

def is_valid_source_path(file_system: FileSystem, path: str) -> bool:
    """
     Checks whether a regexp path refers to a valid set of paths
    :param file_system: pyHadooopWrapper FileSystem
    :param path: path e.g. (s3a|abfss|file|...)://hello@world.com/path/part*.csv
    :return: true if path resolves to existing paths, otherwise false
    """
    return len(file_system.glob_status(path)) > 0

Contribution

Currently basic filesystem operations (listing, deleting, search, iterative listing etc.) are supported. If an operation you require is not yet wrapped, please open an issue or create a PR.

All changes are tested against Spark 3.2 running in local mode.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hadoop-fs-wrapper-0.4.6.tar.gz (11.8 kB view hashes)

Uploaded Source

Built Distribution

hadoop_fs_wrapper-0.4.6-py3-none-any.whl (24.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page