A modern and asynchronous web client for WebHDFS
Project description
aiowebhdfs
I know, nobody uses Hadoop
anymore, but for those who do, here is a library that handles large files with async
features for web requests using the httpx
library and aiofiles
for streaming data from HDFS
Features
- Implements retries and timeout windows with
retry_async
fromopnieuw
library - Implements streaming through the
aiofiles
library - Implments async requests through the
httpx
library - Fully tested for core subset of operations in WebHDFS
v3.2.1
CREATE = Write File
from aiowebhdfs import WebHdfsAsyncClient
client = WebHdfsAsyncClient(host='namenode.local', port=8443, user='spark', kerberos_token=token)
await client.create('c:\\temp\\bigfile.txt', '/data/agg/bigfile.txt', overwrite=False)
OPEN = Read File
from aiowebhdfs import WebHdfsAsyncClient
client = WebHdfsAsyncClient(host='namenode.local', port=8443, user='spark', kerberos_token=token)
await client.open('/data/agg/bigfile.txt')
Content of the file
GETFILESTATUS = File Info
from aiowebhdfs import WebHdfsAsyncClient
client = WebHdfsAsyncClient(host='namenode.local', port=8443, user='spark', kerberos_token=token)
await client.get_file_status('/data/agg/bigfile.txt')
{
"FileStatus":
{
"accessTime" : 0,
"blockSize" : 0,
"group" : "supergroup",
"length" : 0, //in bytes, zero for directories
"modificationTime": 1320173277227,
"owner" : "webuser",
"pathSuffix" : "",
"permission" : "777",
"replication" : 0,
"type" : "DIRECTORY" //enum {FILE, DIRECTORY}
}
}
LISTSTATUS = List Directory
from aiowebhdfs import WebHdfsAsyncClient
client = WebHdfsAsyncClient(host='namenode.local', port=8443, user='spark', kerberos_token=token)
await client.list_directory('/tmp')
{
"FileStatuses":
{
"FileStatus":
[
{
"accessTime" : 1320171722771,
"blockSize" : 33554432,
"group" : "supergroup",
"length" : 24930,
"modificationTime": 1320171722771,
"owner" : "webuser",
"pathSuffix" : "a.patch",
"permission" : "644",
"replication" : 1,
"type" : "FILE"
},
{
"accessTime" : 0,
"blockSize" : 0,
"group" : "supergroup",
"length" : 0,
"modificationTime": 1320895981256,
"owner" : "szetszwo",
"pathSuffix" : "bar",
"permission" : "711",
"replication" : 0,
"type" : "DIRECTORY"
},
...
]
}
}
GETCONTENTSUMMARY = Summary of Directory
from aiowebhdfs import WebHdfsAsyncClient
client = WebHdfsAsyncClient(host='namenode.local', port=8443, user='spark', kerberos_token=token)
await client.list_directory('/tmp')
{
"FileStatuses":
{
"FileStatus":
[
{
"accessTime" : 1320171722771,
"blockSize" : 33554432,
"group" : "supergroup",
"length" : 24930,
"modificationTime": 1320171722771,
"owner" : "webuser",
"pathSuffix" : "a.patch",
"permission" : "644",
"replication" : 1,
"type" : "FILE"
},
{
"accessTime" : 0,
"blockSize" : 0,
"group" : "supergroup",
"length" : 0,
"modificationTime": 1320895981256,
"owner" : "szetszwo",
"pathSuffix" : "bar",
"permission" : "711",
"replication" : 0,
"type" : "DIRECTORY"
},
...
]
}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
aiowebhdfs-0.0.2.tar.gz
(4.0 kB
view details)
Built Distribution
File details
Details for the file aiowebhdfs-0.0.2.tar.gz
.
File metadata
- Download URL: aiowebhdfs-0.0.2.tar.gz
- Upload date:
- Size: 4.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 446eefe5f4e34c867cfbd26ea8089fe71d4413fd01255a2a38f908a7ca88806d |
|
MD5 | 9001e8d786baf6db6ddcaa448948bbab |
|
BLAKE2b-256 | 69e65ed0cc909b05718b7ec036c4eb179f7d9ef841f36edcb03199507220f58e |
File details
Details for the file aiowebhdfs-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: aiowebhdfs-0.0.2-py3-none-any.whl
- Upload date:
- Size: 15.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 130bc82b693d04be6e740dbc14a8bb92589b4798cb63477a6eb5b2ef5ae4be27 |
|
MD5 | a74b0dd52f7f6dc3e5b259b6b1130b76 |
|
BLAKE2b-256 | ce3349f8755db7ef3847d7ad854781f3b96af396f46c1679fd2e7a5f498a33a2 |