Skip to main content

Python Wrapper for Hadoop Java API

Project description

Hadoop FileSystem Java Class Wrapper

Code style: black

Typed Python wrappers for Hadoop FileSystem class family.

Installation

You can install this package from pypi on any Hadoop or Spark runtime:

pip install hadoop-fs-wrapper

Select a version that matches hadoop version you are using:

Hadoop Version / Spark version Compatible hadoop-fs-wrapper version
3.2.x / 3.2.x 0.4.x
3.3.x / 3.3.x 0.4.x, 0.5.x
3.3.x / 3.4.x 0.6.x
3.5.x / 3.5.x 0.7.x

Usage

Common use case is accessing Hadoop FileSystem from Spark session object:

from hadoop_fs_wrapper.wrappers.file_system import FileSystem

file_system = FileSystem.from_spark_session(spark=spark_session)

Then, for example, one can check if there are any files under specified path:

from hadoop_fs_wrapper.wrappers.file_system import FileSystem

def is_valid_source_path(file_system: FileSystem, path: str) -> bool:
    """
     Checks whether a regexp path refers to a valid set of paths
    :param file_system: pyHadooopWrapper FileSystem
    :param path: path e.g. (s3a|abfss|file|...)://hello@world.com/path/part*.csv
    :return: true if path resolves to existing paths, otherwise false
    """
    return len(file_system.glob_status(path)) > 0

Contribution

Currently basic filesystem operations (listing, deleting, search, iterative listing etc.) are supported. If an operation you require is not yet wrapped, please open an issue or create a PR.

All changes are tested against Spark 3.4 running in local mode.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hadoop_fs_wrapper-0.7.1.tar.gz (9.4 kB view details)

Uploaded Source

Built Distribution

hadoop_fs_wrapper-0.7.1-py3-none-any.whl (24.9 kB view details)

Uploaded Python 3

File details

Details for the file hadoop_fs_wrapper-0.7.1.tar.gz.

File metadata

  • Download URL: hadoop_fs_wrapper-0.7.1.tar.gz
  • Upload date:
  • Size: 9.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.11.10 Linux/6.5.0-1025-azure

File hashes

Hashes for hadoop_fs_wrapper-0.7.1.tar.gz
Algorithm Hash digest
SHA256 d71f26974e9c81de0550003ac2a99fca3cc670bb52ddf48ecf432e43f434c89d
MD5 057ff80192d0b39dca46ababc7532723
BLAKE2b-256 d04f8790e7eafb2595df66ee7b6b926c71bc0dbd7a4b49cf93dc3b187edb1c4e

See more details on using hashes here.

File details

Details for the file hadoop_fs_wrapper-0.7.1-py3-none-any.whl.

File metadata

  • Download URL: hadoop_fs_wrapper-0.7.1-py3-none-any.whl
  • Upload date:
  • Size: 24.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.11.10 Linux/6.5.0-1025-azure

File hashes

Hashes for hadoop_fs_wrapper-0.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a3b66a02aa0d6af9375471b728313e26ed376371fe0d0f2af93fd7a8fe8752d3
MD5 5b7d918200c95a9faf30a3f38125091d
BLAKE2b-256 39a99c388bc872598a029d2f72f2056b6007220657a2beadace732c910a2780b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page