Pure Python HDFS client
Project description
Because the world needs yet another way to talk to HDFS from Python.
Usage
This library provides a Python client for WebHDFS. NameNode HA is supported by passing in both NameNodes. Responses are returned as nice Python classes, and any failed operation will raise some subclass of HdfsException matching the Java exception.
Example usage:
>>> fs = pyhdfs.HdfsClient(hosts='nn1.example.com:50070,nn2.example.com:50070', user_name='someone')
>>> fs.list_status('/')
[FileStatus(pathSuffix='benchmarks', permission='777', type='DIRECTORY', ...), FileStatus(...), ...]
>>> fs.listdir('/')
['benchmarks', 'hbase', 'solr', 'tmp', 'user', 'var']
>>> fs.mkdirs('/fruit/x/y')
True
>>> fs.create('/fruit/apple', 'delicious')
>>> fs.append('/fruit/apple', ' food')
>>> with contextlib.closing(fs.open('/fruit/apple')) as f:
... f.read()
...
b'delicious food'
>>> fs.get_file_status('/fruit/apple')
FileStatus(length=14, owner='someone', type='FILE', ...)
>>> fs.get_file_status('/fruit/apple').owner
'someone'
>>> fs.get_content_summary('/fruit')
ContentSummary(directoryCount=3, fileCount=1, length=14, quota=-1, spaceConsumed=14, spaceQuota=-1)
>>> list(fs.walk('/fruit'))
[('/fruit', ['x'], ['apple']), ('/fruit/x', ['y'], []), ('/fruit/x/y', [], [])]
>>> fs.exists('/fruit/apple')
True
>>> fs.delete('/fruit')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../pyhdfs.py", line 525, in delete
...
pyhdfs.HdfsPathIsNotEmptyDirectoryException: `/fruit is non empty': Directory is not empty
>>> fs.delete('/fruit', recursive=True)
True
>>> fs.exists('/fruit/apple')
False
>>> issubclass(pyhdfs.HdfsFileNotFoundException, pyhdfs.HdfsIOException)
True
You can also pass the hostname as part of the URI:
fs.list_status('//nn1.example.com:50070;nn2.example.com:50070/')
The methods and return values generally map directly to WebHDFS endpoints. The client also provides convenience methods that mimic Python os methods and HDFS CLI commands (e.g. walk and copy_to_local).
pyhdfs logs all HDFS actions at the INFO level, so turning on INFO level logging will give you a debug record for your application.
For more information, see the full API docs.
Installing
pip install pyhdfs
Python 3 is required.
Development testing
First run install-hdfs.sh x.y.z, which will download, extract, and run the HDFS NN/DN processes in the current directory. (Replace x.y.z with a real version.) Then run the following commands. Note they will create and delete hdfs://localhost/tmp/pyhdfs_test.
Commands:
python3 -m venv env source env/bin/activate pip install -e . pip install -r dev_requirements.txt pytest
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file PyHDFS-0.3.1.tar.gz
.
File metadata
- Download URL: PyHDFS-0.3.1.tar.gz
- Upload date:
- Size: 13.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.8.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 63b4d0e3929f9ef9897603c66e1336c7d3d3dc51204ed76cd2f3d4a83719f50a |
|
MD5 | d00bcc15060afa416a20262444f2f4ef |
|
BLAKE2b-256 | 91a9e9bf3dc7c1f673765e6ba9acf7d049a7b90cd734d85dfa832cf704a1eb59 |