Skip to main content

Jupyter content manager that uses the HDFS filesystem

Project description

===========================================
HDFS Contents Manager for Jupyter Notebooks
===========================================

A contents manager for Jupyter that uses the Hadoop File System (HDFS) to store Notebooks and files


Getting Started
---------------

1. We assume you already have a running Hadoop Cluster and Jupyter

2. Set the JAVA_HOME and HADOOP_HOME environment variables

3. In some cases you also need to set the CLASSPATH

::

export CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath --glob`

.. code: bash

4. Install HDFSContents Manager. This will also install dependencies such as Pydoop_

::

pip install hdfscontents

.. code: bash

5. Configure and run Jupyter Notebook.

You can either use command line arguments to configure Jupyter to use the HDFSContentsManager class and set HDFS related configurations

::

jupyter-notebook --NotebookApp.contents_manager_class='hdfscontents.hdfsmanager.HDFSContentsManager' \
--NotebookApp.ip='*' \
--HDFSContentsManager.hdfs_namenode_host='localhost' \
--HDFSContentsManager.hdfs_namenode_port=9000 \
--HDFSContentsManager.hdfs_user='myuser' \
--HDFSContentsManager.root_dir='/user/myuser/'

.. code: bash

Alternatively, first run:

::

jupyter-notebook --generate-config

.. code: bash

to generate a default config file. Edit and add the HDFS related configurations in the generated file. Then start the notebook server.
::
jupyter-notebook


.. _Pydoop: http://crs4.github.io/pydoop/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hdfscontents-0.7.tar.gz (10.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page