Skip to main content

Manage and start jupyter slurm kernels

Project description

Slurm Jupyter Kernel

Manage (create, list, modify and delete) and starting jupyter slurm kernels using sbatch.

slurmkernel is able to connect to a kernel started on a compute node using SSH port forwarding. You can specify a SSH proxy jump, if you have to jump over two hosts (e.g. a loadbalancer)

How it works

Slurm job 3251854 is in state "RUNNING"
Slurm job is in state running on compute node cn213
Starting SSH tunnel to forward kernel ports to localhost
Your started kernel is now ready to use on compute node cn213

Features & Use-Cases

  • Start Remote Jupyter kernel using sbatch (Slurm)
    • Access to your local filesystem with remote code execution
  • Manage existing Slurm Jupyter kernel
  • Use the template module to use pre-defined script templates for remote installation and local kernel creation
  • Custom environment variables supported (e.g. JULIA_NUM_THREADS)

Table of Contents

Installation

slurm_jupyter_kernel must be installed locally where the Jupyter notebooks will run.

Install using pip

python3 -m pip install slurm_jupyter_kernel

Requirements for usage

  • SSH-Key based authentication

You need a running SSH agent with the loaded key file to access the loginnode without a password.

Create a new kernel

We assume to install the Jupyter kernel tools into your $HOME directory on your cluster.

Template module (Script templates)

With $ slurmkernel template {list, use, add, edit} you can use pre-defined script templates to initialize your remote environment (IJulia, IPython, ...), add new script templates or edit existing templates.

If you want to create your own script templates, see here: Create Script Templates

Example

Note: Add the parameter --dry-run to check the commands that will be executed!

$ slurmkernel template use --proxyjump lb.hpc.pc2.de --loginnode login001 --user hpcuser1 --template ipython

You will be interactively asked for the required information if you do not pass any arguments when calling slurmkernel template use

IPython Example

Remote Host

  1. load required software (if necessary)
  2. Create a Python virtual environment
  3. Install the IPython package (ipython, ipykernel)
  4. Create a wrapper script and mark it as executable
remotehost ~$ module load lang Python
remotehost ~$ python3 -m venv remotekernel/
remotehost ~$ source remotekernel/bin/activate
(remotekernel) remotehost ~$ python3 -m pip install ipython ipykernel; deactivate
remotehost ~$ echo -e '#!/bin/bash\nmodule load lang Python\n\nsource remotekernel/bin/activate\n"$@"' > remotekernel/ipy_wrapper.sh && chmod +x remotekernel/ipy_wrapper.sh

Localhost

  1. Kernel Remote Slurm kernel with command slurmkernel
notebook ~$ slurmkernel create --displayname "Python 3.8.2" \
--slurm-parameter="account=slurmaccount,time=00:30:00,partition=normal" \
--kernel-cmd="\$HOME/remotekernel/ipy_wrapper.sh ipython kernel -f {connection_file}" \
--proxyjump="lb.n1.pc2.uni-paderborn.de" \
--loginnode="login-0001" \
--language="python"

Example

Set kernel-specific environment

If you want to set kernel specific environment variables (e.g. JULIA_NUM_THREADS for the number of threads) just extend the jupyter kernelspec file with env.

Parameter for slurmkernel:

--environment="JULIA_NUM_THREADS=4"

More information here: https://jupyter-client.readthedocs.io/en/stable/kernels.html

Using the kernel with Applications

  • Install kernel as shown above
    • Make sure that you pass the --language flag as well (e.g. python or julia)

Quarto Example

Troubleshooting

Kernel exceptions

When you start a Jupyter slurm kernel, it throws an excpetion, depending on the error case. You can read the exception from the graphical user interface like JupyterLab. If you start a kernel in the classic notebook view, you can click on "Error" on the left of the kernel status. There you can also find the exception.

Exception Example 1 Exception Example 2

Debugging

If your Slurm jupyter kernel does not start, it can have many causes. Before we turn on the debug mode, check following things:

  • SSH-Agent is active/running and my key is loaded
    • If you can log in in to the loginnode of the HPC system passwordless (Shell) that should work
  • Correct Proxyjump (Loadbalancer), Loginnode

Get help

$ slurmkernel --help

usage: Tool to manage (create, list, modify and delete) and starting jupyter slurm kernels using srun [-h] [--version] {create,list,edit,delete,template} ...

positional arguments:
  {create,list,edit,delete,template}
    create              create a new slurm kernel
    list                list available slurm kernel
    edit                edit an existing slurm kernel
    delete              delete an existing slurm kernel
    template            manage script templates (list, use, add, edit)

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slurm_jupyter_kernel-1.9.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

slurm_jupyter_kernel-1.9-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file slurm_jupyter_kernel-1.9.tar.gz.

File metadata

  • Download URL: slurm_jupyter_kernel-1.9.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.6

File hashes

Hashes for slurm_jupyter_kernel-1.9.tar.gz
Algorithm Hash digest
SHA256 551264ca604e8fec520ec4d0bf5ee1888e60ac8809712288588b51cfee65ea0c
MD5 e3b77887f87c5d3f1127ad170c752486
BLAKE2b-256 164e4480b627f022d101d4127bff5e8c05d89aa7e583d6a243ba5f29597a2bec

See more details on using hashes here.

File details

Details for the file slurm_jupyter_kernel-1.9-py3-none-any.whl.

File metadata

File hashes

Hashes for slurm_jupyter_kernel-1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 cf61dbcd7c63f188ab04329134abe81f2be0118c412c69b2b85994226f08a68c
MD5 025125e3ae2338f5170dc8bab255cdee
BLAKE2b-256 792f563d89795c5157eaac01089a4556ffeb5b2203ae2926f8343539e9e59281

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page