Skip to main content

A modern and asynchronous web client for WebHDFS

Project description

aiowebhdfs

I know, nobody uses Hadoop anymore, but for those who do, here is a library that handles large files with async features for web requests using the httpx library and aiofiles for streaming data from HDFS

Features

  • Implements retries and timeout windows with retry_async from opnieuw library
  • Implements streaming through the aiofiles library
  • Implments async requests through the httpx library
  • Fully tested for core subset of operations in WebHDFS v3.2.1

CREATE = Write File

from aiowebhdfs import WebHdfsAsyncClient
client = WebHdfsAsyncClient(host='namenode.local', port=8443, user='spark', kerberos_token=token)
await client.create('c:\\temp\\bigfile.txt', '/data/agg/bigfile.txt', overwrite=False)

OPEN = Read File

from aiowebhdfs import WebHdfsAsyncClient
client = WebHdfsAsyncClient(host='namenode.local', port=8443, user='spark', kerberos_token=token)
await client.open('/data/agg/bigfile.txt')
Content of the file

GETFILESTATUS = File Info

from aiowebhdfs import WebHdfsAsyncClient
client = WebHdfsAsyncClient(host='namenode.local', port=8443, user='spark', kerberos_token=token)
await client.get_file_status('/data/agg/bigfile.txt')
{
  "FileStatus":
  {
    "accessTime"      : 0,
    "blockSize"       : 0,
    "group"           : "supergroup",
    "length"          : 0,             //in bytes, zero for directories
    "modificationTime": 1320173277227,
    "owner"           : "webuser",
    "pathSuffix"      : "",
    "permission"      : "777",
    "replication"     : 0,
    "type"            : "DIRECTORY"    //enum {FILE, DIRECTORY}
  }
}

LISTSTATUS = List Directory

from aiowebhdfs import WebHdfsAsyncClient
client = WebHdfsAsyncClient(host='namenode.local', port=8443, user='spark', kerberos_token=token)
await client.list_directory('/tmp')
{
  "FileStatuses":
  {
    "FileStatus":
    [
      {
        "accessTime"      : 1320171722771,
        "blockSize"       : 33554432,
        "group"           : "supergroup",
        "length"          : 24930,
        "modificationTime": 1320171722771,
        "owner"           : "webuser",
        "pathSuffix"      : "a.patch",
        "permission"      : "644",
        "replication"     : 1,
        "type"            : "FILE"
      },
      {
        "accessTime"      : 0,
        "blockSize"       : 0,
        "group"           : "supergroup",
        "length"          : 0,
        "modificationTime": 1320895981256,
        "owner"           : "szetszwo",
        "pathSuffix"      : "bar",
        "permission"      : "711",
        "replication"     : 0,
        "type"            : "DIRECTORY"
      },
      ...
    ]
  }
}

GETCONTENTSUMMARY = Summary of Directory

from aiowebhdfs import WebHdfsAsyncClient
client = WebHdfsAsyncClient(host='namenode.local', port=8443, user='spark', kerberos_token=token)
await client.list_directory('/tmp')
{
  "FileStatuses":
  {
    "FileStatus":
    [
      {
        "accessTime"      : 1320171722771,
        "blockSize"       : 33554432,
        "group"           : "supergroup",
        "length"          : 24930,
        "modificationTime": 1320171722771,
        "owner"           : "webuser",
        "pathSuffix"      : "a.patch",
        "permission"      : "644",
        "replication"     : 1,
        "type"            : "FILE"
      },
      {
        "accessTime"      : 0,
        "blockSize"       : 0,
        "group"           : "supergroup",
        "length"          : 0,
        "modificationTime": 1320895981256,
        "owner"           : "szetszwo",
        "pathSuffix"      : "bar",
        "permission"      : "711",
        "replication"     : 0,
        "type"            : "DIRECTORY"
      },
      ...
    ]
  }
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aiowebhdfs-0.0.2.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

aiowebhdfs-0.0.2-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file aiowebhdfs-0.0.2.tar.gz.

File metadata

  • Download URL: aiowebhdfs-0.0.2.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.4

File hashes

Hashes for aiowebhdfs-0.0.2.tar.gz
Algorithm Hash digest
SHA256 446eefe5f4e34c867cfbd26ea8089fe71d4413fd01255a2a38f908a7ca88806d
MD5 9001e8d786baf6db6ddcaa448948bbab
BLAKE2b-256 69e65ed0cc909b05718b7ec036c4eb179f7d9ef841f36edcb03199507220f58e

See more details on using hashes here.

File details

Details for the file aiowebhdfs-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: aiowebhdfs-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.4

File hashes

Hashes for aiowebhdfs-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 130bc82b693d04be6e740dbc14a8bb92589b4798cb63477a6eb5b2ef5ae4be27
MD5 a74b0dd52f7f6dc3e5b259b6b1130b76
BLAKE2b-256 ce3349f8755db7ef3847d7ad854781f3b96af396f46c1679fd2e7a5f498a33a2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page