Skip to main content

A modern and asynchronous web client for WebHDFS

Project description

aiowebhdfs

I know, nobody uses Hadoop anymore, but for those who do, here is a library that handles large files with async features for web requests using the httpx library and aiofiles for streaming data from HDFS

Features

  • Implements retries and timeout windows with retry_async from opnieuw library
  • Implements streaming through the aiofiles library
  • Implments async requests through the httpx library
  • Fully tested for core subset of operations in WebHDFS v3.2.1

CREATE = Write File

from aiowebhdfs import WebHdfsAsyncClient
client = WebHdfsAsyncClient(host='namenode.local', port=8443, user='spark', kerberos_token=token)
client.create('c:\\temp\\bigfile.txt', '/data/agg/bigfile.txt', overwrite=False)

OPEN = Read File

from aiowebhdfs import WebHdfsAsyncClient
client = WebHdfsAsyncClient(host='namenode.local', port=8443, user='spark', kerberos_token=token)
client.open('/data/agg/bigfile.txt')
Content of the file

GETFILESTATUS = File Info

from aiowebhdfs import WebHdfsAsyncClient
client = WebHdfsAsyncClient(host='namenode.local', port=8443, user='spark', kerberos_token=token)
client.get_file_status('/data/agg/bigfile.txt')
{
  "FileStatus":
  {
    "accessTime"      : 0,
    "blockSize"       : 0,
    "group"           : "supergroup",
    "length"          : 0,             //in bytes, zero for directories
    "modificationTime": 1320173277227,
    "owner"           : "webuser",
    "pathSuffix"      : "",
    "permission"      : "777",
    "replication"     : 0,
    "type"            : "DIRECTORY"    //enum {FILE, DIRECTORY}
  }
}

LISTSTATUS = List Directory

from aiowebhdfs import WebHdfsAsyncClient
client = WebHdfsAsyncClient(host='namenode.local', port=8443, user='spark', kerberos_token=token)
client.list_directory('/tmp')
{
  "FileStatuses":
  {
    "FileStatus":
    [
      {
        "accessTime"      : 1320171722771,
        "blockSize"       : 33554432,
        "group"           : "supergroup",
        "length"          : 24930,
        "modificationTime": 1320171722771,
        "owner"           : "webuser",
        "pathSuffix"      : "a.patch",
        "permission"      : "644",
        "replication"     : 1,
        "type"            : "FILE"
      },
      {
        "accessTime"      : 0,
        "blockSize"       : 0,
        "group"           : "supergroup",
        "length"          : 0,
        "modificationTime": 1320895981256,
        "owner"           : "szetszwo",
        "pathSuffix"      : "bar",
        "permission"      : "711",
        "replication"     : 0,
        "type"            : "DIRECTORY"
      },
      ...
    ]
  }
}

GETCONTENTSUMMARY = Summary of Directory

from aiowebhdfs import WebHdfsAsyncClient
client = WebHdfsAsyncClient(host='namenode.local', port=8443, user='spark', kerberos_token=token)
client.list_directory('/tmp')
{
  "FileStatuses":
  {
    "FileStatus":
    [
      {
        "accessTime"      : 1320171722771,
        "blockSize"       : 33554432,
        "group"           : "supergroup",
        "length"          : 24930,
        "modificationTime": 1320171722771,
        "owner"           : "webuser",
        "pathSuffix"      : "a.patch",
        "permission"      : "644",
        "replication"     : 1,
        "type"            : "FILE"
      },
      {
        "accessTime"      : 0,
        "blockSize"       : 0,
        "group"           : "supergroup",
        "length"          : 0,
        "modificationTime": 1320895981256,
        "owner"           : "szetszwo",
        "pathSuffix"      : "bar",
        "permission"      : "711",
        "replication"     : 0,
        "type"            : "DIRECTORY"
      },
      ...
    ]
  }
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aiowebhdfs-0.0.1.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aiowebhdfs-0.0.1-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file aiowebhdfs-0.0.1.tar.gz.

File metadata

  • Download URL: aiowebhdfs-0.0.1.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.4

File hashes

Hashes for aiowebhdfs-0.0.1.tar.gz
Algorithm Hash digest
SHA256 6b2e7f3a417796d442786608008c49589d4505538ceeee98a79b342c71ee730f
MD5 a2dd11fb100c01d6bbb2d02246853be0
BLAKE2b-256 906f5f9bf425a05b4e100bf93913046cb2671c9e5bb9d46e2c22a9629dfdde56

See more details on using hashes here.

File details

Details for the file aiowebhdfs-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: aiowebhdfs-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.4

File hashes

Hashes for aiowebhdfs-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f2df99cf354421a3b4d8dff2b0a89508aaa930c3d011e3086fb97fe238fab608
MD5 63d9ccab5ffd16a470cccd28fe97cd6e
BLAKE2b-256 440dfb5aa60a0bf48880d8327b75d29fdfa3df4c71a94095b21481f1b7bd39b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page