JupyterHub Spawner for spawning into multiple kubernetes clusters

These details have not been verified by PyPI

Project links

Project description

jupyterhub-multicluster-kubespawner

Launch user pods into many different kubernetes clusters from the same JupyterHub!

Why?

A single JupyterHub as an 'entrypoint' to compute in a variety of clusters can be extremely useful. Users can dynamically decide to launch their notebooks (and dask clusters) dynamically based on a variety of factors - closer to their data, on different cloud providers, paid for by different billing accounts, etc. It also makes life much easier for JupyterHub operators.

You can check out an early demo of the spawner in action here.

Installation

jupyterhub-multicluster-kubespawner is available from PyPI:

pip install jupyterhub-multicluster-kubespawner

You'll also need to install kubectl as well as any tools needed to authenticate to your target clusters.

Cloud Provider	Tool
Google Cloud	gcloud
AWS	aws
Azure	az
DigitalOcean	doctl

Configuration

You can ask JupyterHub to use MultiClusterKubeSpawner with the following config snippet in your jupyterhub_config.py file, although more configuration is needed to connect the hub to different clusters.

Configuration philosophy

MultiClusterKubeSpawner tries to be as kubernetes-native as possible, unlike the venerable kubespawner. It doesn't try to provide a layer of abstraction over what kubernetes offers, as we have found that is often a very leaky abstraction. This makes it difficult for JupyterHub operators to take advantage of all the powerful features Kubernetes offers, and increases maintenance burden for the maintainers.

MultiClusterKubeSpawner uses the popular kubectl under the hood, making the configuration familiar for anyone who has a basic understanding of working with Kubernetes clusters. The flip side is that some familiarity with Kubernetes is required to successfully configure this spawner, but the tradeoff seems beneficial for everyone.

Setting up `KUBECONFIG`

Since multicluster-kubespawner talks to multiple Kubernetes clusters, it uses a kubeconfig file connect to the kubernetes clusters. It looks for the file in ~/.kube/config - in production environments, your file probably exists elsewhere - you can set the KUBECONFIG environment variable to point to the location of the file.

Each cluster is represented by a context, which is a combination of a pointer to where the cluster's kubernetes API endpoint is as well as what credentials to use to authenticate to it. More details here.

The easiest way to construct a kubeconfig that will work with all the clusters you want to use is to carefully construct it locally on your laptop and then copy that file to your deployment.

Start by setting your KUBECONFIG env var locally to a file that you can then copy over.

export KUBECONFIG=jupyterhub-mcks-kubeconfig

On Google Cloud

Create a Google Cloud Service Account, and give it enough permissions to access your Kubernetes cluster. roles/container.developer should be enough permission.
Create a JSON Service Account Key for your service account. This is what kubectl will eventually use to authenticate to the kubernetes cluster. You'll need to put this service account key in your production JupyterHub environment as well.
Set the GOOGLE_APPLICATION_CREDENTIALS environment variable to point to the location of this JSON Service Account key.
```
export GOOGLE_APPLICATION_CREDENTIALS=<path-to-json-service-account-key>
```

Generate an appropriate entry in your custom kubeconfig file.

gcloud container clusters get-credentials <cluster-name> --zone=<zone>

When you deploy your JupyterHub, make sure that you set the environment variable GOOGLE_APPLICATION_CREDENTIALS to point to the service account key in a place where JupyterHub can find it.

On AWS with EKS

AWS has a plethora of interesting options to authenticate with it, here we will specify the simplest (albeit maybe not the most secure or 'best practice').

Create an AWS IAM User for use by kubectl to authenticate to AWS. This user will need a access key and access secret, but no console access.
Create an access key for this user. JupyterHub will need these while running to make requests to the kubernetes API, set as environment variables.
Grant the user access to the eks:DescribeCluster permission, either directly or via a group you create specifically for this purpose.
Grant the user access to the Kubernetes API by editing the aws-auth configmap as described in this document.

Generate an appropriate entry in your KUBECONFIG file:

export AWS_ACCESS_KEY_ID=<access-key-id>
export AWS_SECRET_ACCESS_KEY=<access-key-secret>
aws eks update-kubeconfig --name=<cluster-name> --region=<aws-region>

When you deploy your JupyterHub, you need to set the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY on the JupyterHub process itself so it can talk to the Kubernetes API properly.

On DigitalOcean

Install doctl
Create a personal access token to access your kubernetes cluster. Unlike the other cloud providers, this can not be scoped - it grants full access to your entire DO account! So use with care.

Generate an appropriate entry in your KUBECONFIG file:

export DIGITALOCEAN_ACCESS_TOKEN=<your-digitalocean-access-token>
doctl kubernetes cluster kubeconfig save <cluster-name>

When you deploy your JupyterHub, you need to set the environment variable DIGITALOCEAN_ACCESS_TOKEN on the JupyterHub process itself so it can talk to the Kubernetes API properly.

Setting up target clusters

Each target cluster needs to have an Ingress Controller installed that the spawner can talk to. This provides a public IP that JupyterHub can use to proxy traffic to the user pods on that cluster.

Any ingress provider will do, although the current suggested one is to use Project Contour, as it's faster than the more popular nginx-ingress at picking up routing changes.

A 'production' install might use helm and the contour helm chart. But to quickly get started, you can also just configure your kubectl to point to the correct kubernes cluster and run kubectl apply --wait -f https://projectcontour.io/quickstart/contour.yaml. After it succeeds, you can get the public IP of the ingress controller with kubectl -n projectcontour get svc envoy. The EXTERNAL-IP value here can be passed to the ingress_public_url configuration option for your cluster.

Setup `profile_list`

After login, each user will be provided with a list of profiles to choose from. Each profile can point to a different kubernetes cluster, as well as other customizations such as image to use, amount of RAM / CPU, GPU use, etc.

Each item in the list is a python dictionary, with the following keys recognized:

display_name: Name to display to the user in the profile selection screen
description: Description to display to the user in the profile selection screen
spawner_override: Dictionary of spawner options for this profile, determining where the user pod is spawned and other properties of it.

The following properties are supported under spawner_override.

kubernetes_context: Name of the kubernetes context to use when connecting to this cluster. You can use kubectl config get-contexts to get a list of contexts available in your KUBECONFIG file.
ingress_public_url: URL to the public endpoint of the Ingress controller you setup earlier. This should be formatted as a URL, so don't forget the http://. For production systems, you should also setup HTTPS for your ingress controller, and provide the https://<domain-name> here.
patches: A list of patches (as passed to kubectl patch). Described in more detail below. This is the primary method of customizing the user pod, although some convenience methods are also offered (detailed below).
environment: A dictionary of extra environment variables to set for the user pod.
image: The image to use for the user pod. Defaults to pangeo/pangeo-notebook:latest.
mem_limit and mem_guarantee, as understood by JupyterHub
cpu_limit and cpu_guarantee, as understood by JupyterHub

Here is a simple example:

c.MultiClusterKubeSpawner.profile_list = [
    {
        "display_name": "Google Cloud in us-central1",
        "description": "Compute paid for by funder A, closest to dataset X",
        "spawner_override": {
            "kubernetes_context": "<kubernetes-context-name-for-this-cluster">,
            "ingress_public_url": "http://<ingress-public-ip-for-this-cluster>"
        }
    },
    {
        "display_name": "AWS on us-east-1",
        "description": "Compute paid for by funder B, closest to dataset Y",
        "spawner_override": {
            "kubernetes_context": "<kubernetes-context-name-for-this-cluster">,
            "ingress_public_url": "http://<ingress-public-ip-for-this-cluster>",
            "patches": {
                "01-memory": """
                    kind: Pod
                    metadata:
                        name: {{key}}
                    spec:
                        containers:
                        - name: notebook
                        resources:
                            requests:
                                memory: 16Mi
                    """,
            }
        }
    },
]

Customizations with `patches`

To try and be as kubernetes-native as possible, we use strategic merge patch as implemented by kubectl to allow JupyterHub operators to customize per-user resources. This lets operators have fine grained control over what gets spawned per-user, without requiring a lot of effort by the maintainers of this spawner to support each possible customization.

Behind the scenes, kubectl patch is used to merge the initial list of generated kubernetes resources for each user with some customizations before they are passed to kubectl apply. Operators set these by customizing the patches traitlet. It can be either set for all profiles by setting c.MultiClusterKubeSpawner.patches or just for a particular set of profiles by setting patches under spawner_override for that particular profile.

patches is a dictionary, where the key is used just for sorting and the value is a string that should be a valid YAML object when parsed after template substitution. Resources are merged based on the value for kind and metadata.name keys in the YAML. kubectl knows when to add items to a list or merge their properties on appropriate attributes.

To patch the user pod to add some extra annotations to the pod and request a GPU, you could set the following:

c.MultiClusterKubernetesSpawner.patches = {
    "01-annotations": """
    kind: Pod
    metadata:
        name: {{key}}
        annotations:
            something-else: hey
    """,
    "02-gpu": """
    kind: Pod
    metadata:
        name: {{key}}
    spec:
        containers:
        - name: notebook
          resources:
            limits:
                nvidia.com/gpu: 1
    """,
}

The values are first expanded via jinja2 templates before being passed to kubectl patch. {{key}} expands to the name of the resource created, and you should use it for all your modifications. In the 02-gpu patch, kubectl knows to merge this with the existing notebook container instead of create a new container or replace all the existing values, because it knows there already exists a container with the name property set to notebook. Hence it merges values provides in this patch with the existing configuration for the container.

Please read the kubectl documentation to understand how strategic merge patch works.

Additional per-user kubernetes resources with `resources`

You can also create arbitrary additional kubernetes resources for each user by setting the resources configuration. It's a dictionary where the key is used for sorting, and the value should be valid YAML after expansion via jinja2 template.

For example, the following config creates a Kubernetes Role and RoleBinding for each user to allow them to (insecurely!) run a dask-kubernetes cluster.

c.MultiClusterKubernetesSpawner.resources = {
    "10-dask-role": """
    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
      name: {{key}}-dask
    rules:
    - apiGroups:
      - ""
      resources:
      - pods
      verbs:
      - list
      - create
      - delete
    """,
    "11-dask-rolebinding": """
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: {{key}}-dask
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: Role
      name: {{key}}-dask
    subjects:
    - apiGroup: ""
      kind: ServiceAccount
      name: {{key}}
    """,
)

This takes advantage of the fact that by default a Kubernetes Service Account is already created for each pod by MultiClusterUserSpawner, and gives it just enough rights to create, list and delete pods.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2

Jan 17, 2022

0.1

Jan 17, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jupyterhub-multicluster-kubespawner-0.2.tar.gz (15.9 kB view details)

Uploaded Jan 17, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

jupyterhub_multicluster_kubespawner-0.2-py3-none-any.whl (16.9 kB view details)

Uploaded Jan 17, 2022 Python 3

File details

Details for the file jupyterhub-multicluster-kubespawner-0.2.tar.gz.

File metadata

Download URL: jupyterhub-multicluster-kubespawner-0.2.tar.gz
Upload date: Jan 17, 2022
Size: 15.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for jupyterhub-multicluster-kubespawner-0.2.tar.gz
Algorithm	Hash digest
SHA256	`74cc2f4d55c819394351a206a0a0cc553a88dedc28be06bc74b37d2d86343921`
MD5	`5740f5d2c57ec85a13e4efac9130c1e9`
BLAKE2b-256	`93ee934a93116bfc60afd8abfaf01141d66d48517fe92683157d57cc93527588`

See more details on using hashes here.

File details

Details for the file jupyterhub_multicluster_kubespawner-0.2-py3-none-any.whl.

File metadata

Download URL: jupyterhub_multicluster_kubespawner-0.2-py3-none-any.whl
Upload date: Jan 17, 2022
Size: 16.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for jupyterhub_multicluster_kubespawner-0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4ab91e6e7d251098e5bd28ec405a7fe8183d2ed6a3b6d2c7f2f92c775badca41`
MD5	`a812a301bb594115457637638190a08b`
BLAKE2b-256	`861b768d740128ae55201e9c56040f5a1a14d3941c5bb7dea42e9d2702308079`

See more details on using hashes here.

jupyterhub-multicluster-kubespawner 0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

jupyterhub-multicluster-kubespawner

Why?

Installation

Configuration

Configuration philosophy

Setting up `KUBECONFIG`

On Google Cloud

On AWS with EKS

On DigitalOcean

Setting up target clusters

Setup `profile_list`

Customizations with `patches`

Additional per-user kubernetes resources with `resources`

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

jupyterhub-multicluster-kubespawner 0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

jupyterhub-multicluster-kubespawner

Why?

Installation

Configuration

Configuration philosophy

Setting up KUBECONFIG

On Google Cloud

On AWS with EKS

On DigitalOcean

Setting up target clusters

Setup profile_list

Customizations with patches

Additional per-user kubernetes resources with resources

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Setting up `KUBECONFIG`

Setup `profile_list`

Customizations with `patches`

Additional per-user kubernetes resources with `resources`