Skip to main content

A Compute agnostic pipelining software

Project description


python: Pypi Code style: black MyPy Checked Tests:


Please check here for complete documentation

Example

The below data science flavored code is a well-known iris example from scikit-learn.

"""
Example of Logistic regression using scikit-learn
https://scikit-learn.org/stable/auto_examples/linear_model/plot_iris_logistic.html
"""

import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
from sklearn.inspection import DecisionBoundaryDisplay
from sklearn.linear_model import LogisticRegression


def load_data():
    # import some data to play with
    iris = datasets.load_iris()
    X = iris.data[:, :2]  # we only take the first two features.
    Y = iris.target

    return X, Y


def model_fit(X: np.ndarray, Y: np.ndarray, C: float = 1e5):
    logreg = LogisticRegression(C=C)
    logreg.fit(X, Y)

    return logreg


def generate_plots(X: np.ndarray, Y: np.ndarray, logreg: LogisticRegression):
    _, ax = plt.subplots(figsize=(4, 3))
    DecisionBoundaryDisplay.from_estimator(
        logreg,
        X,
        cmap=plt.cm.Paired,
        ax=ax,
        response_method="predict",
        plot_method="pcolormesh",
        shading="auto",
        xlabel="Sepal length",
        ylabel="Sepal width",
        eps=0.5,
    )

    # Plot also the training points
    plt.scatter(X[:, 0], X[:, 1], c=Y, edgecolors="k", cmap=plt.cm.Paired)

    plt.xticks(())
    plt.yticks(())

    plt.savefig("iris_logistic.png")

    # TODO: What is the right value?
    return 0.6


## Without any orchestration
def main():
    X, Y = load_data()
    logreg = model_fit(X, Y, C=1.0)
    generate_plots(X, Y, logreg)


## With runnable orchestration
def runnable_pipeline():
    # The below code can be anywhere
    from runnable import Catalog, Pipeline, PythonTask, metric, pickled

    # X, Y = load_data()
    load_data_task = PythonTask(
        function=load_data,
        name="load_data",
        returns=[pickled("X"), pickled("Y")],  # (1)
    )

    # logreg = model_fit(X, Y, C=1.0)
    model_fit_task = PythonTask(
        function=model_fit,
        name="model_fit",
        returns=[pickled("logreg")],
    )

    # generate_plots(X, Y, logreg)
    generate_plots_task = PythonTask(
        function=generate_plots,
        name="generate_plots",
        terminate_with_success=True,
        catalog=Catalog(put=["iris_logistic.png"]),  # (2)
        returns=[metric("score")],
    )

    pipeline = Pipeline(
        steps=[load_data_task, model_fit_task, generate_plots_task],
    )  # (4)

    pipeline.execute()

    return pipeline


if __name__ == "__main__":
    # main()
    runnable_pipeline()
  1. Return two serialized objects X and Y.
  2. Store the file iris_logistic.png for future reference.
  3. Define the sequence of tasks.
  4. Define a pipeline with the tasks

The difference between native driver and runnable orchestration:

!!! tip inline end "Notebooks and Shell scripts"

You can execute notebooks and shell scripts too!!

They can be written just as you would want them, *plain old notebooks and scripts*.
- X, Y = load_data()
+load_data_task = PythonTask(
+    function=load_data,
+     name="load_data",
+     returns=[pickled("X"), pickled("Y")], (1)
+    )

-logreg = model_fit(X, Y, C=1.0)
+model_fit_task = PythonTask(
+   function=model_fit,
+   name="model_fit",
+   returns=[pickled("logreg")],
+   )

-generate_plots(X, Y, logreg)
+generate_plots_task = PythonTask(
+   function=generate_plots,
+   name="generate_plots",
+   terminate_with_success=True,
+   catalog=Catalog(put=["iris_logistic.png"]), (2)
+   )


+pipeline = Pipeline(
+   steps=[load_data_task, model_fit_task, generate_plots_task], (3)

  • Domain code remains completely independent of driver code.
  • The driver function has an equivalent and intuitive runnable expression
  • Reproducible by default, runnable stores metadata about code/data/config for every execution.
  • The pipeline is runnable in any environment.

Documentation

More details about the project and how to use it available here.


Installation

The minimum python version that runnable supports is 3.8

pip install runnable

Please look at the installation guide for more information.

Pipelines can be:

Linear

A simple linear pipeline with tasks either python functions, notebooks, or shell scripts

Parallel branches

Execute branches in parallel

loops or map

Execute a pipeline over an iterable parameter.

Arbitrary nesting

Any nesting of parallel within map and so on.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

runnable-0.13.0.tar.gz (100.7 kB view details)

Uploaded Source

Built Distribution

runnable-0.13.0-py3-none-any.whl (122.7 kB view details)

Uploaded Python 3

File details

Details for the file runnable-0.13.0.tar.gz.

File metadata

  • Download URL: runnable-0.13.0.tar.gz
  • Upload date:
  • Size: 100.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.9.20 Linux/6.5.0-1025-azure

File hashes

Hashes for runnable-0.13.0.tar.gz
Algorithm Hash digest
SHA256 fd62eb396c9e675e968dda699f2563e7b96ccde700c4a9b3fd597fb9b1481ed1
MD5 ed109313ba9b50a4209c6d7518a3f282
BLAKE2b-256 4fc4346df9a6f3c81568a54f6847e6ca1a243a0a0aa70f9cc7ec2897ab41a65b

See more details on using hashes here.

File details

Details for the file runnable-0.13.0-py3-none-any.whl.

File metadata

  • Download URL: runnable-0.13.0-py3-none-any.whl
  • Upload date:
  • Size: 122.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.9.20 Linux/6.5.0-1025-azure

File hashes

Hashes for runnable-0.13.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9e4e79789072250de4a3003c5503c706d26146f2f1ea23510da14581d59e3034
MD5 ba763bd642b498478c825ce743d9d1f0
BLAKE2b-256 0f25ee83d296bb41c37dc14b0ca8225e759693dd4c7a415f4ab79c43873c11e6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page