Argo-Workflow backend extension for Jupyter-Scheduler.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

argo-jupyter-scheduler

Table of Contents

argo-jupyter-scheduler

Argo-Jupyter-Scheduler

Submit longing running notebooks to run without the need to keep your JupyterLab server running. And submit a notebook to run on a specified schedule.

Installation

pip install argo-jupyter-scheduler

What is it?

Argo-Jupyter-Scheduler is a plugin to the Jupyter-Scheduler JupyterLab extension.

What does that mean?

This means this is an application that gets installed in the JupyterLab base image and runs as an extension in JupyterLab. Specifically, you will see this icon at the bottom of the JupyterLab Launcher tab:

And this icon on the toolbar of your Jupyter Notebook:

This also means, as a lab extension, this application is running within each user's separate JupyterLab server. The record of the notebooks you've submitted is specific to you and you only. There is no central Jupyter-Scheduler.

However, instead of using the base Jupyter-Scheduler, we are using Argo-Jupyter-Scheduler.

Why?

If you want to run your Jupyter Notebook on a schedule, you need to be assured that the notebook will be executed at the times you specified. The fundamental limitation with Jupyter-Scheduler is that when your JupyterLab server is not running, Jupyter-Scheduler is not running. Then the notebooks you had scheduled won't run. What about notebooks that you want to run right now? If the JupyterLab server is down, then how will the status of the notebook run be recorded?

The solution is Argo-Jupyter-Scheduler: Jupyter-Scheduler front-end with an Argo-Workflows back-end.

A deeper dive

In the Jupyter-Scheduler lab extension, you can create two things, a Job and a Job Definition.

`Job`

A Job, or notebook job, is when you submit your notebook to run.

In Argo-Jupyter-Scheduler, this Job translates into a Workflow in Argo-Workflows. So when you create a Job, your notebook job will create a Workflow that will run regardless of whether or not your JupyterLab server is.

At the moment, permission to submit Jobs is required, managed by the Keycloak roles for the argo-server-sso client. If your user has either the argo-admin or the argo-developer roles, they will be permitted to create and submit Jobs (and Job Definitions).

We are also relying on the Nebari Workflow Controller to ensure the user's home directory and conda-store environments are mounted to the Workflow. This allows us to ensure:

the files in the user's home directory can be used by the notebook job
the output of the notebook can be saved locally
when the conda environment that is used gets updated, it is also updated for the notebook job (helpful for scheduled jobs)
the node-selector and image you submit your notebook job from are the same ones used by the workflow

`Job Definition`

A Job-Definition is simply a way to create to Jobs that run on a specified schedule.

In Argo-Jupyter-Scheduler, Job Definition translate into a Cron-Workflow in Argo-Worflows. So when you create a Job Definition, you create a Cron-Workflow which in turn creates a Workflow to run when scheduled.

A Job is to Workflow as Job Definition is to Cron-Workflow.

Internals

Jupyter-Scheduler creates and uses a scheduler.sqlite database to manage and keep track of the Jobs and Job Definitions. If you can ensure this database is accessible and can be updated when the status of a job or a job definition change, then you can ensure the view the user sees from JupyterLab match is accurate.

By default this database is located at ~/.local/share/jupyter/scheduler.sqlite but this is a trailet that can be modified. And since we have access to this database, we can update the database directly from the workflow itself.

To acommplish this, the workflow runs in two steps. First the workflow runs the notebook, using papermill and the conda environment specified. And second, depending on the success of this notebook run, updates the database with this status.

And when a job definition is created, a corresponding cron-workflow is created. To ensure the database is properly updated, the workflow that the cron-workflow creates has three steps. First, create a job record in the database with a status of IN PROGRESS. Second, run the notebook, again using papermill and the conda environment specified. And third, update the newly created job record with the status of the notebook run.

License

argo-jupyter-scheduler is distributed under the terms of the MIT license.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

2024.3.1

Mar 27, 2024

2024.3.1rc1 pre-release

Mar 22, 2024

2024.1.3

Jan 12, 2024

2023.9.1rc2 pre-release

Sep 28, 2023

2023.9.1rc1 pre-release

Sep 27, 2023

2023.9.1rc0 pre-release

Sep 27, 2023

2023.7.1

Jul 21, 2023

2023.7.1rc1 pre-release

Jul 20, 2023

0.1.0.dev1 pre-release

Jul 20, 2023

This version

0.0.1

Jul 20, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

argo_jupyter_scheduler-0.0.1.tar.gz (15.1 kB view hashes)

Uploaded Jul 20, 2023 Source

Built Distribution

argo_jupyter_scheduler-0.0.1-py3-none-any.whl (13.4 kB view hashes)

Uploaded Jul 20, 2023 Python 3

Hashes for argo_jupyter_scheduler-0.0.1.tar.gz

Hashes for argo_jupyter_scheduler-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`f8e551b50c12b884be59f79fb47fb723b8b8e7725eb5b3db8f093fbad6cc09d0`
MD5	`72585158dad8dd208ce4f46ef678d826`
BLAKE2b-256	`e05a0c16e479b015a8d9b7957b99b122d156acaaa78b01f20f6be06a6082baab`

Hashes for argo_jupyter_scheduler-0.0.1-py3-none-any.whl

Hashes for argo_jupyter_scheduler-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`84855578306e5bd97c8d3dbcd89dd72d31848d8886fcc5b4949993e6f861fb47`
MD5	`3d64a86f6906df767888eb547f8bb7c9`
BLAKE2b-256	`979482aa1d1213c98ee3a593b802b6ae98d570527ab149d9ef6df5152e100a29`