ALEXFlow is a python workflow library built for reproducible complex workflow
Project description
alexflow
ALEXFlow is a python workflow library built for reproducible complex workflow, mainly for machine learning training.
Get Started
For the installation from pypi, simply install via pip.
pip install alexflow
Remarks
Support of type hints with dataclasses
luigi does not work well with type hints, which makes it difficult to build workflow when it is complex. With use of dataclasses, we'd like to gain benefit of type hints.
Build workflow by composition, rather than parameter bucket relies.
Parameter bucket rely finally build a huge global state at the entrypoint of workflow, which is pretty difficult to maintain in general as it is works similarly with global variables... Instead, we've decided to compose workflow with compositions. With this architecture we can gain the benefit of divide and conquer strategy.
Focus of reproducibility with immutability tasks
Task class is designed to be a immutable dataclass object, for distributed execution, strong consistency, and reproducibility. And also those Task
objects can be serialized as json object, and you can easily trace the exact parameters used to generate the Output
.
Dependency via Outputs, rather than Tasks
Description of workflow dependency by Output
makes it easy to run partially graph.
A exmaple of Task construction
Also you can see the example workflow at examples/workflow.py
.
from typing import Tuple
from sklearn import linear_model
from dataclasses import dataclass, field
from alexflow import Task, no_default, NoDefaultVar, Output, BinaryOutput
@dataclass(frozen=True)
class Train(Task):
# Here you can write parameter of task as dataclass fields. Task's unique id will be
# generated from given parameters' and each task is executed at once while the entire
# graph computation.
X: NoDefaultVar[Output] = no_default
y: NoDefaultVar[Output] = no_default
model_type: NoDefaultVar[str] = no_default
# Here you can describe in-significant parameter with compare=False, with following
# dataclass' object equality. Even you changed those variables, Task's unique id is
# consistent.
verbose: bool = field(default=True, compare=False)
def input(self):
"""Here describes the dependent output of your task"""
return self.X, self.y
def output(self):
"""Here describes the dependent output of your task"""
return self.build_output(BinaryOutput, key="model.pkl")
def run(self, input: Tuple[BinaryOutput, BinaryOutput], output: BinaryOutput):
# Dependent output you defined in `input()` method is available as input variable.
X = input[0].load()
y = input[1].load()
model_class = getattr(linear_model, self.model_type)
cls = model_class().fit(X, y)
# And you can store what you want to output in following manner.
output.store(cls)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file alexflow-1.1.4-py3-none-any.whl
.
File metadata
- Download URL: alexflow-1.1.4-py3-none-any.whl
- Upload date:
- Size: 22.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.50.1 CPython/3.7.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3bd171e1f4ff70ce8981bf2744271590b310417d005e73d9c3ece45dae294d34 |
|
MD5 | fd50a1acd11b543bc9f438470004434e |
|
BLAKE2b-256 | 67dab1d5504fb73b833d938107a2a2a17850d2cf3af2def0fc2a351a980e8748 |