Skip to main content

Remote Jupyter Lab kernel for Databricks

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

Local JupyterLab connecting to Databricks via SSH

This package allows to connect to a remote Databricks cluster from a locally running Jupyter Lab:

1 Prerequisites

  1. Operating System

    Either Macos or Linux. Windows is currently not supported

  2. Anaconda installation

    A recent version of Anaconda with Python >= 3.5 The tool conda must be newer then 4.7.5

  3. Databricks CLI

    To install Databricks CLI and configure profile(s) for your cluster(s), please refer to AWS / Azure

    Whenever $PROFILE is used in this documentation, it refers to a valid Databricks CLI profile name, stored in a shell environment variable.

  4. SSH access to the Databricks cluster

    Configure your Databricks clusters to allow ssh access, see Configure SSH access

    Only clusters with valid ssh configuration are visible to databrickslabs_jupyterlab.

2 Installation

  • Create a new conda environment and install databrickslabs_jupyterlab with the following commands:

    (base)$ conda create -n db-jlab python=3.6
    (base)$ conda activate db-jlab
    (db-jlab)$ pip install --upgrade databrickslabs-jupyterlab==1.0.2-rc6
    

    The prefix (db-jlab)$ for all command examples in this document assumes that the databrickslabs_jupyterlab conda enviromnent db-jlab is activated.

  • Bootstrap the environment for databrickslabs_jupyterlab with the following command:

    (db-jlab)$ databrickslabs-jupyterlab -b
    

    It finishes with an overview of the usage.

3 Usage

Ensure, ssh access is correctly configured, see Configure SSH access

3.1 Starting Jupyter Lab

  • Create a jupyter kernel specification for a Databricks CLI profile $PROFILE and start Jupyter Lab with the following command:

    (db-jlab)$ databrickslabs-jupyterlab $PROFILE -l
    

Notes:

  • The command with -l is a shortcut for

    (db-jlab)$ databrickslabs-jupyterlab $PROFILE -k
    (db-jlab)$ jupyter lab
    

    that ensures that the kernel specificiation is updated (one could omit the first step if the kernel specification is up to date)

  • A new kernel is available in the kernel change menu (see here for an explanation of the kernel name structure)

3.2 Using Spark in the Notebook

Getting a remote Spark Session in the notebook

When the cluster is already running the status bar of Jupyter lab should show

kernel ready

To connect to the remote Spark context, enter the following two lines into a notebook cell:

[1] from databrickslabs_jupyterlab.connect import dbcontext
    dbcontext()

This will request you to add the token copied to clipboard above:

    Fri Aug  9 09:58:04 2019 py4j imported
    Enter personal access token for profile 'demo' |_____________________________|

After pressing Enter, you will see

    Gateway created for cluster '0806-143104-skirt84' ... connected
    The following global variables have been created:
    - spark       Spark session
    - sc          Spark context
    - sqlContext  Hive Context
    - dbutils     Databricks utilities

Overview

Note: databrickslabs-jupyterlab $PROFILE -c let's you quickly copy the token to the clipboard so that you can simply paste the token to the input box.

Switching kernels

Kernels can be switched via the Jupyterlab Kernel Change dialog. However, when switching to a remote kernel, the local connection context might get out of sync and the notebook cannot be used. In this case:

  1. shutdown the kernel
  2. Select the remote kernel again from the Jupyterlab Kernel Change dialog.

A simple Kernel Restart by Jupyter lab will not work since this does not refresh the connection context!

Restart after cluster auto-termination

Should the cluster auto terminate while the notebook is connected or the network connection is down, the status bar will change to

  • kernel disconnected

Additionally a dialog to confirm that the remote cluster should be started again will be launched in Jupyter Lab

Notes:

  • One can check connectivity before, e.g. by calling ssh <cluster_id> in a terminal window)
  • After cancelling the dialog, clicking on the status bar entry as indicated by the message will open the dialog box again

During restart the following status messages will be shown in this order:

  • cluster-starting
  • installing-cluster-libs
  • installing-driver-libs
  • configure-ssh
  • starting

After successful start the status would again show:

  • kernel ready

4) Advanced topics

4 Test notebooks

To work with the test notebooks in ./examples the remote cluster needs to have the following libraries installed:

  • mlflow==1.0.0
  • spark-sklearn

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databrickslabs_jupyterlab-1.0.2rc6.tar.gz (33.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

databrickslabs_jupyterlab-1.0.2rc6-py3-none-any.whl (38.2 kB view details)

Uploaded Python 3

File details

Details for the file databrickslabs_jupyterlab-1.0.2rc6.tar.gz.

File metadata

  • Download URL: databrickslabs_jupyterlab-1.0.2rc6.tar.gz
  • Upload date:
  • Size: 33.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.6.9

File hashes

Hashes for databrickslabs_jupyterlab-1.0.2rc6.tar.gz
Algorithm Hash digest
SHA256 5382563a6f67e0074a3b69acc8ca0ce8c514109c791d38030b1802b8b29529de
MD5 c33f67b562a14b4fe43fa39c00794f2a
BLAKE2b-256 0ca497a48dda9d0e12e84bf21307a2d135e0ef4bf562a8039860b2595f24f71f

See more details on using hashes here.

File details

Details for the file databrickslabs_jupyterlab-1.0.2rc6-py3-none-any.whl.

File metadata

  • Download URL: databrickslabs_jupyterlab-1.0.2rc6-py3-none-any.whl
  • Upload date:
  • Size: 38.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.6.9

File hashes

Hashes for databrickslabs_jupyterlab-1.0.2rc6-py3-none-any.whl
Algorithm Hash digest
SHA256 385c5bfb85ee9d2133298154b7d4b259d2167603916aa40b19ca91667d331d52
MD5 110a76612dc5196404150afbaf78d024
BLAKE2b-256 48de25b5b7140d5a56da3ccb0b1d5eb59017b5de399ca6a80a270c649210922f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page