Skip to main content

ALEXFlow is a python workflow library built for reproducible complex workflow

Project description

alexflow

ALEXFlow is a python workflow library built for reproducible complex workflow, mainly for machine learning training.

Get Started

For the installation from pypi, simply install via pip.

pip install alexflow

Remarks

Support of type hints with dataclasses

luigi does not work well with type hints, which makes it difficult to build workflow when it is complex. With use of dataclasses, we'd like to gain benefit of type hints.

Build workflow by composition, rather than parameter bucket relies.

Parameter bucket rely finally build a huge global state at the entrypoint of workflow, which is pretty difficult to maintain in general as it is works similarly with global variables... Instead, we've decided to compose workflow with compositions. With this architecture we can gain the benefit of divide and conquer strategy.

Focus of reproducibility with immutability tasks

Task class is designed to be a immutable dataclass object, for distributed execution, strong consistency, and reproducibility. And also those Task objects can be serialized as json object, and you can easily trace the exact parameters used to generate the Output.

Dependency via Outputs, rather than Tasks

Description of workflow dependency by Output makes it easy to run partially graph.

A exmaple of Task construction

Also you can see the example workflow at examples/workflow.py.

from typing import Tuple
from sklearn import linear_model
from dataclasses import dataclass, field
from alexflow import Task, no_default, NoDefaultVar, Output, BinaryOutput


@dataclass(frozen=True)
class Train(Task):
    # Here you can write parameter of task as dataclass fields. Task's unique id will be 
    # generated from given parameters' and each task is executed at once while the entire
    # graph computation.
    X: NoDefaultVar[Output] = no_default
    y: NoDefaultVar[Output] = no_default
    model_type: NoDefaultVar[str] = no_default
    # Here you can describe in-significant parameter with compare=False, with following
    # dataclass' object equality. Even you changed those variables, Task's unique id is
    # consistent.
    verbose: bool = field(default=True, compare=False)

    def input(self):
        """Here describes the dependent output of your task"""
        return self.X, self.y

    def output(self):
        """Here describes the dependent output of your task"""
        return self.build_output(BinaryOutput, key="model.pkl")

    def run(self, input: Tuple[BinaryOutput, BinaryOutput], output: BinaryOutput):
        # Dependent output you defined in `input()` method is available as input variable.
        X = input[0].load()
        y = input[1].load()

        model_class = getattr(linear_model, self.model_type)

        cls = model_class().fit(X, y)

        # And you can store what you want to output in following manner.
        output.store(cls)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

alexflow-1.1.4-py3-none-any.whl (22.0 kB view details)

Uploaded Python 3

File details

Details for the file alexflow-1.1.4-py3-none-any.whl.

File metadata

  • Download URL: alexflow-1.1.4-py3-none-any.whl
  • Upload date:
  • Size: 22.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.50.1 CPython/3.7.8

File hashes

Hashes for alexflow-1.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 3bd171e1f4ff70ce8981bf2744271590b310417d005e73d9c3ece45dae294d34
MD5 fd50a1acd11b543bc9f438470004434e
BLAKE2b-256 67dab1d5504fb73b833d938107a2a2a17850d2cf3af2def0fc2a351a980e8748

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page