Skip to main content

Schedule parameterized notebooks programmatically using cli or a REST API

Project description

LabFunctions

labfunctions readthedocs PyPI - Version PyPI - Format PyPI - Status Docker last codecov

Description

LabFunctions is a library and a service that allows you to run parametrized notebooks on demand.

It was thought to empower different data roles to put notebooks into production whatever they do, this notebooks could be models, ETL process, crawlers, etc. This way of working should allow going backward and foreward in the process of building data products.

Although this tool allow different workflows in a data project, we propose this one as an example: Workflow

Philosophy

LabFunctions isn't a complete MLOps solution.

We try hard to expose the right APIs to the user for the part of scheduling notebooks with reproducibility in mind.

Whenever possible we try to use well established open source tools, projects and libraries to resolve common problems. Moreover we force some good practices like code versioning, and the use of containers to run wokrloads

The idea comes from a Netflix post which suggest using notebooks like an interface or a some kind of DSL to orchestrate different workloads like Spark and so on. But it also could be used to run entire process as we said before.

The benefits of this approach is that notebooks runned could be stored and inspected for good or for bad executions. If something fails, is easy to run in a classical way: cell by cell in a local pc or in a remote server.

Status

⚠️ Although the project is considered stable please keep in mind that LabFunctions is still under active development and therefore full backward compatibility is not guaranteed before reaching v1.0.0., APIS could change.

Features

Some features can be used standalone, and others depend on each other.

Feature Status Note
Notebook execution Stable -
Workflow scheduling Beta This allow to schedule: every hour, every day, etc
Build Runtimes Beta Build OCI compliance continers (Docker) and store it.
Runtimes templates Stable Genereate Dockerfile based on templates
Create and destroy servers Alpha Create and delete Machines in different cloud providers
GPU Support Beta Allows to run workloads that requires GPU
Execution History Alpha Track notebooks & workflows executions
Google Cloud support Beta Support google store and google cloud as provider
Secrets managment Alpha Encrypt and manager private data in a project
Project Managment Alpha Match each git repostiroy to a project

Cluster options

It is possible to run different cluster configurations with custom auto scalling policies

GPU CLUSTER DEMO

Instances inside a cluster could be created manually or automatically

See a simple demo of a gpu cluster creation

https://www.youtube.com/watch?v=-R7lJ4dGI9s

:earth_americas: Roadmap

See Roadmap draft

:post_office: Architecture

labfunctions architecture

:bookmark_tabs: References & inspirations

Contributing

Bug reports and pull requests are welcome on GitHub at the issues page. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

This project is licensed under Apache 2.0. Refer to LICENSE.txt.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

labfunctions-0.9.0.tar.gz (131.2 kB view hashes)

Uploaded source

Built Distribution

labfunctions-0.9.0-py3-none-any.whl (186.0 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page