Skip to main content
Help us improve PyPI by participating in user testing. All experience levels needed!

A wrapper library to access Hadoop HTTP REST API

Project description

A Python 2/3 wrapper library to access Hadoop WebHDFS REST API

Installation

To install webhdfspy from PyPI:

$ pip install webhdfspy

Python versions

webhdfspy supports Python 2.7 and 3.4

Usage

>>> import webhdfspy
>>> webHDFS = webhdfspy.WebHDFSClient("localhost", 50070, "username")
>>> print(webHDFS.listdir('/'))
[]
>>> webHDFS.mkdir('/foo')
True
>>> print(webHDFS.listdir('/'))
[{u'group': u'supergroup', u'permission': u'755', u'blockSize': 0, u'accessTime': 0, u'pathSuffix': u'foo', u'modificationTime': 1429805040695, u'replication': 0, u'length': 0, u'childrenNum': 0, u'owner': u'username', u'storagePolicy': 0, u'type': u'DIRECTORY', u'fileId': 16387}]
>>> print webHDFS.create('/foo/foo.txt', "just put some text here", True)
True
>>> print webHDFS.open('/pywebhdfs_test/foo.txt')
just put some text here
>>> webHDFS.remove('/foo')
True
>>> print(webHDFS.listdir('/'))
[]

Hadoop configuration

To enable WebHDFS in Hadoop add this to your $HADOOP_DIR/conf/hdfs-site.xml:

<property>
     <name>dfs.webhdfs.enabled</name>
     <value>true</value>
</property>

To enable append on HDFS you need to configure your hdfs-site.xml as follows:

<property>
    <name>dfs.support.append</name>
    <value>true</value>
</property>

More about WebHDFS: https://hadoop.apache.org/docs/r1.0.4/webhdfs.html

Project details


Release history Release notifications

This version
History Node

0.3.5

History Node

0.3.4

History Node

0.3.3

History Node

0.3.2

History Node

0.3.1

History Node

0.3

History Node

0.2.1

History Node

0.2

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
webhdfspy-0.3.5.tar.gz (4.4 kB) Copy SHA256 hash SHA256 Source None Sep 8, 2016

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page