Skip to main content

pywhdfs: python Web HDFS Client.

Project description

PyWHDFS

API and interactive command line interface for Web HDFS.

  $ pywhdfs -c prod

  ---------------------------------------------------
  ------------------- PYWEBHDFS 1.0.0 ------------------
  -- Welcome to WEB HDFS interactive python shell. --
  -- The WEB HDFS client is available as `CLIENT`. --
  --------------------- Enjoy ! ---------------------
  ---------------------------------------------------


  >>> CLIENT.list("/")
  [u'admin', u'data', u'group', u'solr', u'system', u'tmp', u'user']

Functionalities

  • Python Library for interacting with WebHDFS and HTTFS Rest API
  • Support both secure (Kerberos,Token) and insecure clusters
  • Supports HA cluster and handle namenode failover
  • Supports HDFS federation with multiple nameservices and mount points.
  • Json format clusters configuration.
  • Command line interface to interactively interact with WebHDFS API on a python shell.
  • Support concurency on uploads and downloads.

Getting started

  $ easy_install pywhdfs

Some dependencies require the following packages to be also installed :

  • krb5-devel krb5-libs
  • gcc
  • python-devel

Configuration

PyWHDFS uses a json configuration file that define the connection parameters for the different clusters. A simple configuration file looks like:

  {
    "clusters": [
      {
        "name": "prod",
        "auth_mechanism": "GSSAPI",
        "verify": false,
        "truststore": "trust/store/path.jks",
        "nameservices": [
          {
            "urls": ["http://first_namenode_url:50070" , "http://second_namenode_url:50070"],
            "mounts": ["/"]
          }
         ]
      }
    ]
  }

The configuration file is validated against a schema file

The default location of configuration file is "~/.webhdfs.cfg" but can can be overwritten using WEBHDFS_CONFIG environement variable.

USAGE

The interactive python shell client is the easiest way to use pywhdfs, but you can also instanciate the client manually :

 >>>import pywhdfs.client as pywhdfs
 >>>CLIENT = pywhdfs.WebHDFSClient(nameservices=[{'urls':[ "http://host1.hadoop.domain:50070" , "http://host2.hadoop.domain:50070"],'mounts':['/']}], auth_mechanism="GSSAPI", verify=False)
 >>>CLIENT.list("/")

The interacctive shell requires the connection parameters for the cluster to be setup in the configuration file, and the cluster name needs to match the name you pass as argument.

Contributing

Feedback and Pull requests are very welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pywhdfs-1.1.3.tar.gz (39.4 kB view details)

Uploaded Source

File details

Details for the file pywhdfs-1.1.3.tar.gz.

File metadata

  • Download URL: pywhdfs-1.1.3.tar.gz
  • Upload date:
  • Size: 39.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/44.1.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/2.7.10

File hashes

Hashes for pywhdfs-1.1.3.tar.gz
Algorithm Hash digest
SHA256 5e7004ee85f5163832277dc91638a68435c5b1da672125c728f3209b98156968
MD5 70f0f0c44bb8bb14ba6310efcbe0e3d2
BLAKE2b-256 0eac85aa2039a335cc6ecea067fd29510f64cd10c9f0f1c2df8a00b8c7ba9cb4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page