Skip to main content

Pure Python HDFS client

Project description

Because the world needs yet another way to talk to HDFS from Python.

Usage

This library provides a Python client for WebHDFS. NameNode HA is supported by passing in both NameNodes. Responses are returned as nice Python classes, and any failed operation will raise some subclass of HdfsException matching the Java exception.

Example usage:

>>> fs = pyhdfs.HdfsClient(hosts='nn1.example.com:50070,nn2.example.com:50070', user_name='someone')
>>> fs.list_status('/')
[FileStatus(pathSuffix='benchmarks', permission='777', type='DIRECTORY', ...), FileStatus(...), ...]
>>> fs.listdir('/')
['benchmarks', 'hbase', 'solr', 'tmp', 'user', 'var']
>>> fs.mkdirs('/fruit/x/y')
True
>>> fs.create('/fruit/apple', 'delicious')
>>> fs.append('/fruit/apple', ' food')
>>> with contextlib.closing(fs.open('/fruit/apple')) as f:
...     f.read()
...
b'delicious food'
>>> fs.get_file_status('/fruit/apple')
FileStatus(length=14, owner='someone', type='FILE', ...)
>>> fs.get_file_status('/fruit/apple').owner
'someone'
>>> fs.get_content_summary('/fruit')
ContentSummary(directoryCount=3, fileCount=1, length=14, quota=-1, spaceConsumed=14, spaceQuota=-1)
>>> list(fs.walk('/fruit'))
[('/fruit', ['x'], ['apple']), ('/fruit/x', ['y'], []), ('/fruit/x/y', [], [])]
>>> fs.exists('/fruit/apple')
True
>>> fs.delete('/fruit')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../pyhdfs.py", line 525, in delete
  ...
pyhdfs.HdfsPathIsNotEmptyDirectoryException: `/fruit is non empty': Directory is not empty
>>> fs.delete('/fruit', recursive=True)
True
>>> fs.exists('/fruit/apple')
False
>>> issubclass(pyhdfs.HdfsFileNotFoundException, pyhdfs.HdfsIOException)
True

You can also pass the hostname as part of the URI:

fs.list_status('//nn1.example.com:50070;nn2.example.com:50070/')

The methods and return values generally map directly to WebHDFS endpoints. The client also provides convenience methods that mimic Python os methods and HDFS CLI commands (e.g. walk and copy_to_local).

pyhdfs logs all HDFS actions at the INFO level, so turning on INFO level logging will give you a debug record for your application.

For more information, see the full API docs.

Installing

pip install pyhdfs

You’ll need Python 2.7 or Python 3.

Development testing

https://travis-ci.org/jingw/pyhdfs.svg?branch=master http://codecov.io/github/jingw/pyhdfs/coverage.svg?branch=master

First run install-hdfs.sh x.y.z, which will download, extract, and run the HDFS NN/DN processes in the current directory. (Replace x.y.z with a real version.) Then run the following commands. Note they will create and delete hdfs://localhost/tmp/pyhdfs_test.

Python 3:

virtualenv3 --no-site-packages env3
source env3/bin/activate
pip3 install -e .
pip3 install -r dev_requirements.txt
py.test

And again for Python 2 (after deactivate):

virtualenv2 --no-site-packages env2
source env2/bin/activate
pip2 install -e .
pip2 install -r dev_requirements.txt
py.test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyHDFS-0.2.2.tar.gz (12.0 kB view details)

Uploaded Source

File details

Details for the file PyHDFS-0.2.2.tar.gz.

File metadata

  • Download URL: PyHDFS-0.2.2.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.7.3

File hashes

Hashes for PyHDFS-0.2.2.tar.gz
Algorithm Hash digest
SHA256 d5c5676ce5c00dc99dd1522fbe1f1a6e3e318f34972f6443cd82f8a85294f230
MD5 4986ce055ad5b698b43cbb27b79f0c9e
BLAKE2b-256 1bc4707adc71153c245b25b9010b3ed002364199291c98f518dc559ded6ae6e8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page