Schedule parameterized notebooks programmatically using cli or a REST API
Project description
LabFunctions
Description
LabFunctions is a library and a service that allows you to run parametrized notebooks on demand.
It was thought to empower different data roles to put notebooks into production whatever they do, this notebooks could be models, ETL process, crawlers, etc. This way of working should allow going backward and foreward in the process of building data products.
Although this tool allow different workflows in a data project, we propose this one as an example:
Philosophy
LabFunctions isn't a complete MLOps solution.
We try hard to expose the right APIs to the user for the part of scheduling notebooks with reproducibility in mind.
Whenever possible we try to use well established open source tools, projects and libraries to resolve common problems. Moreover we force some good practices like code versioning, and the use of containers to run wokrloads
The idea comes from a Netflix post which suggest using notebooks like an interface or a some kind of DSL to orchestrate different workloads like Spark and so on. But it also could be used to run entire process as we said before.
The benefits of this approach is that notebooks runned could be stored and inspected for good or for bad executions. If something fails, is easy to run in a classical way: cell by cell in a local pc or in a remote server.
Status
⚠️ Although the project is considered stable please keep in mind that LabFunctions is still under active development and therefore full backward compatibility is not guaranteed before reaching v1.0.0., APIS could change.
Features
Some features can be used standalone, and others depend on each other.
Feature | Status | Note |
---|---|---|
Notebook execution | Stable | - |
Workflow scheduling | Beta | This allow to schedule: every hour, every day, etc |
Build Runtimes | Beta | Build OCI compliance continers (Docker) and store it. |
Runtimes templates | Stable | Genereate Dockerfile based on templates |
Create and destroy servers | Alpha | Create and delete Machines in different cloud providers |
GPU Support | Beta | Allows to run workloads that requires GPU |
Execution History | Alpha | Track notebooks & workflows executions |
Google Cloud support | Beta | Support google store and google cloud as provider |
Secrets managment | Alpha | Encrypt and manager private data in a project |
Project Managment | Alpha | Match each git repostiroy to a project |
Cluster options
It is possible to run different cluster configurations with custom auto scalling policies
Instances inside a cluster could be created manually or automatically
See a simple demo of a gpu cluster creation
https://www.youtube.com/watch?v=-R7lJ4dGI9s
:earth_americas: Roadmap
See Roadmap draft
:post_office: Architecture
:bookmark_tabs: References & inspirations
- Notebook Innovation - Netflix
- Tensorflow metastore
- Maintainable and collaborative pipelines
- The magic of Merlin
- Scale aware approach
Contributing
Bug reports and pull requests are welcome on GitHub at the issues page. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.
License
This project is licensed under Apache 2.0. Refer to LICENSE.txt.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for labfunctions-0.9.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 75ab2db04b1c550d64a525a33c6d11f0263a5bb2c0905bab6db1f748596568da |
|
MD5 | 987222bd4fb2a576fbda5224de9deb0a |
|
BLAKE2b-256 | 4c698e07309ec8ace28b56baa2721da8384412a5e5c2fa37ba1d08423b96167d |