Skip to main content

A library for splitting python workflows into separate tasks

Project description

Bandsaw

pipeline status coverage report

Bandsaw is a python library that allows to split a python workflow into individual tasks that can be run separately with different python interpreters and even on different machines.

What it does

Bandsaw can be used to create distributed python scripts, that define workflows in heterogeneous environments and with conflicting dependencies. It is especially meant for building complex machine learning processes, that use different machine learning frameworks like tensorflow or pytorch in a single workflow, or need to run on multiple different computation platforms, e.g. in different regions due to data restrictions.

How it works

This works by decorating python functions with the bandsaw @task decorator:

import bandsaw

...

@bandsaw.task
def my_function(x):
    return x

This decorator gives bandsaw the opportunity to run additional code before and after the code within my_function(x) is executed. This additional code is defined in classes that fulfill the Advice protocol.

When this decorated function is called, bandsaw intercepts the call first and runs the before() methods of the configured Advices. Then the wrapped function is computed and afterwards each advice's after() method is called.

Diagram of an advised task

All callback functions receive a Session object as single argument, which is used for continuing the execution by calling its proceed() method. Advices have the possibility to conclude() this early by providing a Result on their own. This shortcuts the computation and can be used for caching results. Additionally, the session can be serialized and transferred to other python processes even on other machines using its save(stream) and restore(stream) methods.

A full explanation of this can be found in the latest user guide.

Develop

Bandsaw uses tox to build and test the library. Tox runs all tests on different python versions, can generate the documentation and run linters and style checks to improve the code quality. In order to install all the necessary python modules, please run:

pip install tox

Afterwards the tests can be run by just calling

tox

from the project directory. For this to work, you need to have multiple python interpreters installed. If you don't want to run the tests on all supported platforms just edit the tox.ini file and set

envlist = py36,py37,py38

to contain only the python version you want to use. Another option is to run tox with the additional command line argument '--skip_missing_interpreters' which skips python versions that aren't installed.

Documentation

The latest version of the documentation can always be found under https://docs.kant.ai/bandsaw/latest. The documentation is written in Markdown and is located in the docs directory of the project. It can be built into static HTML by using MkDocs. In order to manually generate the documentation we can use tox to build the HTML pages from our markdown.

tox -e docs

Release

Releasing a new package version

Releasing new versions of bandsaw is done using flit.

pip install flit

In order to be able to publish a new release, you need an account with PyPI or their respective test environment.

Add those accounts into your ~.pypirc:

[distutils]
index-servers =
  pypi
  pypitest

[pypi]
username: <my-user> 

[pypitest]
repository: https://test.pypi.org/legacy/
username: <my-test-user>

Publishing a new release to test

flit publish --repository pypitest

Releasing a new version of the documentation

The package uses mike to manage multiple versions of the documentation. The already generated documentation is kept in the docs-deployment branch and will be automatically deployed, if the branch is pushed to the repository.

In order to build a new version of the documentation, we need to use the corresponding tox environment:

VERSION_TAG='<my-version>' tox -e docs-release

The VERSION_TAG environment variable should be set to the new version in format '.'. This will build the documentation and add it as new commits to the docs-deployment branch.

By pushing the updated branch to the gitlab repository, the documentation will be automatically deployed to the official documentation website.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bandsaw-0.1.tar.gz (49.0 kB view hashes)

Uploaded Source

Built Distribution

bandsaw-0.1.0-py3-none-any.whl (26.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page