Skip to main content

Schedule parameterized notebooks programmatically using cli or a REST API

Project description

:rocket: NB Workflows

nb-workflows readthedocs PyPI - Format PyPI - Status

codecov

:books: Description

NB Workflows empowers different data roles to put notebooks into production, simplifying the time required to do so. It enables people to go from a data exploration instance to an entirely project deployed in production, using the same notebooks files made by a data scientist, analyst or whatever role working with data in an iterative way.

NB Workflows is a library and a service that allows you to run parametrized notebooks in a distributed way.

A Notebook could be launched remotly on demand, or could be scheduled by intervals or using cron syntax.

Internally it uses Sanic as web server, papermill as notebook executor, an RQ for task distributions and coordination.

:tada: Demo :tada:

:floppy_disk: Example project

:telescope: Philosophy

NB Workflows it insn't a complete MLOps solution and it will never be. We try hard to simply and expose the right APIs to the user for the part of scheduling notebooks with reproducibility in mind.

We also try to give the user the same freedom that lego tiles can give, but we are opinated in some aspects: we understand the process of writing code for data science or/and data analytics, as a engineer problem to be solved

With this point of view, then:

  1. Git is neccesary :wink:
  2. Docker is necessary for environment reproducibility.
  3. Although you can push not versioned code, versioning is almost enforced, and is always a good practice in software development

The idea comes from a Netflix post which suggest using notebooks like an interface or a some kind of DSL to orchestrate different workloads like Spark and so on. But it also could be used to run entire process: like training a model, crawlings sites, performing etls, and so on.

The benefits of this approach is that notebooks runned could be stored and inspected for good or for bad. If something fails, is easy to run in a classical way: cell by cell.

The last point to clarify and it could challange the common sense or the way that we are used to use Jupyter's Notebooks, is that each notebook is more like a function definition with inputs and outputs, so a notebook potentially could be used for different purposes, hence the name of workflow, and indeed this idea is common in the data space. Then a workflow will be a notebook with params defined to be used anytime that a user wants, altering or not the parameters sent.

:nut_and_bolt: Features

  • Define a notebook like a function, and execute it on demand or scheduled it
  • Automatic Dockerfile generation. A project should share a unique environment but could use different versions of the same environment
  • Execution History, Notifications to Slack or Discord.

Installation

Server

Docker-compose

The project provides a docker-compose.yaml file as en example.

:construction: Note :construction:

Because NB Workflows will spawn docker instance for each workload, the installation inside docker containers could be tricky. The most difficult part is the configuration of the worker that needs access to the docker socket.

A Dockerfile is provided for customization of uid and guid witch should match with the local environment. A second alternative is expose the docker daemon through HTTP, if that is the case a DOCKER_HOST env could be used, see docker client sdk

git clone https://github.com/nuxion/nb_workflows
cd nb_workflows

The next step is intializing the database and creating a user (please review the script first):

docker-compose postgres up -d 
./scripts/initdb_docker.sh

Now you can start everything else:

docker-compose up -d 

Without docker

pip install nb-workflows[server]==0.6.0

first terminal:

export NB_SERVER=True
nb manager db upgrade
nb manager users create
nb web --apps workflows,history,projects,events,runtimes

second terminal:

nb rqworker -w 1 -q control,mch.default

Before all that, redis postgresql and the nginx in webdav mode should be configurated

Client

Client:

pip install nb-workflows==0.6.0
nb startporject .

:earth_americas: Roadmap

See Roadmap draft

:post_office: Architecture

nb_workflows architecture

:bookmark_tabs: References & inspirations

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nb_workflows-0.7.0.tar.gz (87.6 kB view details)

Uploaded Source

Built Distribution

nb_workflows-0.7.0-py3-none-any.whl (124.8 kB view details)

Uploaded Python 3

File details

Details for the file nb_workflows-0.7.0.tar.gz.

File metadata

  • Download URL: nb_workflows-0.7.0.tar.gz
  • Upload date:
  • Size: 87.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.10 Linux/5.10.27_1

File hashes

Hashes for nb_workflows-0.7.0.tar.gz
Algorithm Hash digest
SHA256 44e944121a07a7cf1795b3f432c400636cf9899ceb4685d1ec799de3fc35467f
MD5 7f30a0c89796c51bf672ae45af2a95db
BLAKE2b-256 9d6f98315e3938c536a8a6b6f556bb644c2ae20ddcd63bed6185829fce19a5bb

See more details on using hashes here.

File details

Details for the file nb_workflows-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: nb_workflows-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 124.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.10 Linux/5.10.27_1

File hashes

Hashes for nb_workflows-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0f5f1121d93703fffa71532516d88a580d310498488104873cb5a9977145db8b
MD5 cba8bb74891256e742f6a9d57127c0dc
BLAKE2b-256 d50c40466d1aa90ca66d5945010530adf04fa848cceb720694f8e3a4ef815b32

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page