Skip to main content

Python Wrapper for Hadoop Java API

Project description

Hadoop FileSystem Java Class Wrapper

Code style: black

Typed Python wrappers for Hadoop FileSystem class family.

Installation

You can install this package from pypi on any Hadoop or Spark runtime:

pip install hadoop-fs-wrapper

Select a version that matches hadoop version you are using:

Hadoop Version / Spark version Compatible hadoop-fs-wrapper version
3.2.x / 3.2.x 0.4.x
3.3.x / 3.3.x 0.4.x, 0.5.x
3.3.x / 3.4.x 0.6.x
3.5.x / 3.5.x 0.7.x

Usage

Common use case is accessing Hadoop FileSystem from Spark session object:

from hadoop_fs_wrapper.wrappers.file_system import FileSystem

file_system = FileSystem.from_spark_session(spark=spark_session)

Then, for example, one can check if there are any files under specified path:

from hadoop_fs_wrapper.wrappers.file_system import FileSystem

def is_valid_source_path(file_system: FileSystem, path: str) -> bool:
    """
     Checks whether a regexp path refers to a valid set of paths
    :param file_system: pyHadooopWrapper FileSystem
    :param path: path e.g. (s3a|abfss|file|...)://hello@world.com/path/part*.csv
    :return: true if path resolves to existing paths, otherwise false
    """
    return len(file_system.glob_status(path)) > 0

Contribution

Currently basic filesystem operations (listing, deleting, search, iterative listing etc.) are supported. If an operation you require is not yet wrapped, please open an issue or create a PR.

All changes are tested against Spark 3.4 running in local mode.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hadoop_fs_wrapper-0.7.2.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hadoop_fs_wrapper-0.7.2-py3-none-any.whl (24.9 kB view details)

Uploaded Python 3

File details

Details for the file hadoop_fs_wrapper-0.7.2.tar.gz.

File metadata

  • Download URL: hadoop_fs_wrapper-0.7.2.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.11.14 Linux/6.11.0-1018-azure

File hashes

Hashes for hadoop_fs_wrapper-0.7.2.tar.gz
Algorithm Hash digest
SHA256 2e1267a136fd44d524a0edaf15c1b219f88d8520e5434873277fc350d92aab35
MD5 953a7b54218255c6cdfcb1ab5cd75938
BLAKE2b-256 544b788365f2a4a58fa6cd992699e066c42f05be3bfedf210680ddb2d0852a92

See more details on using hashes here.

File details

Details for the file hadoop_fs_wrapper-0.7.2-py3-none-any.whl.

File metadata

  • Download URL: hadoop_fs_wrapper-0.7.2-py3-none-any.whl
  • Upload date:
  • Size: 24.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.11.14 Linux/6.11.0-1018-azure

File hashes

Hashes for hadoop_fs_wrapper-0.7.2-py3-none-any.whl
Algorithm Hash digest
SHA256 41664f351f5d11e8a46873abc9364ed86746cafe4e64d8a62183e47c4a57872f
MD5 48523244fe9f654372b219cd74656cbc
BLAKE2b-256 7c06f605b05e6f6c558dec30015294c09c26c8f2fc973563d4557ced79e49464

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page