Skip to main content

HdfsCLI: a command line interface for WebHDFS.

Project description

HdfsCLI build_image

API and command line interface for HDFS.

Features

  • Python bindings for the WebHDFS API, supporting both secure and insecure clusters.

  • Lightweight CLI with aliases for convenient namenode URL caching.

  • Additional functionality through optional extensions:

    • avro, allowing reading/writing Avro files directly from JSON.

    • dataframe, enabling fast loading/saving of pandas dataframes from/to HDFS.

    • kerberos, adding support for Kerberos authenticated clusters.

Installation

Using pip:

$ pip install hdfs

By default none of the package requirements for extensions are installed. To do so simply suffix the package name with the desired extensions:

$ pip install hdfs[avro,dataframe,kerberos]

By default the command line entry point will be named hdfs. If this conflicts with another utility, you can choose another name by specifying the HDFS_ENTRY_POINT environment variable:

$ HDFS_ENTRY_POINT=hdfscli pip install hdfs

API

Sample snippet using a python client to create a file on HDFS, rename it, download it locally, and finally delete the remote copy.

from hdfs import KerberosClient

client = KerberosClient('http://namenode:port', root='/user/alice')
client.write('hello.md', 'Hello, world!')
client.rename('hello.md', 'hello.rst')
client.download('hello.rst', 'hello.rst')
client.delete('hello.rst')

CLI

Sample commands:

$ hdfs --read logs/1987-03-23 >>logs
$ hdfs --write -o data/weights.tsv <weights.tsv

Cf. hdfs --help for the full list of commands and options.

Documentation

The full documentation can be found here.

Project details


Release history Release notifications | RSS feed

This version

0.5.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hdfs-0.5.1.tar.gz (26.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page