Skip to main content

Reproducible and Reusable Data Analysis Workflow Server (Core Infrastructure)

Project description

https://img.shields.io/pypi/pyversions/flowserv-core.svg https://badge.fury.io/py/flowserv-core.svg https://img.shields.io/badge/License-MIT-yellow.svg https://github.com/scailfin/flowserv-core/workflows/build/badge.svg Documentation Status https://codecov.io/gh/scailfin/flowserv-core/branch/master/graph/badge.svg
flowServ Logo

About

This repository contains the implementation of the core infrastructure for the Reproducible and Reusable Data Analysis Workflow Server (flowServ). This is an experimental prototype to support reuse and evaluation of published data analysis pipelines as well as community benchmarks of data analysis algorithms. flowServ is not yet-another workflow engine. The aim instead is to provide a layer between a client (e.g. a Web user interface) and a workflow engine to facilitate the execution of a defined workflow templates (as shown in the figure below). flowServ is designed to be independent of the underlying workflow engine.

Workflow templates contain placeholders for workflow steps and/or input data and parameters that are provided by the user (e.g., by providing Docker containers that satisfy the workflow steps or uploading input data files). flowServ triggers and monitors the execution of the workflow for the given input values and maintains the workflow results. The API provides the functionality to submit new workflow runs and to retrieve the evaluation results of completed workflow runs.

ROB Architecture

flowServ was motivated by the Reproducible Open Benchmarks for Data Analysis Platform (ROB). The goal of ROB is to allow user communities to evaluate the performance of their different data analysis algorithms in a controlled competition-style format. In ROB, the benchmark coordinator defines the workflow template along with input data. Benchmark participants provide their own implementation of the variable workfow steps. The workflow engine processes workflows on submission. Execution results are maintained by flowServ in an internal database. The goal of flowServ is to be a more generic platform that can not only be used for benchmarks but also for other types of data analysis workflows.

More Information

Workflow templates are motivated by the goal to allow users to run pre-defined data analytics workflows while providing their own input data, parameters, as well as their own code modules. Workflow templates are inspired by, but not limited to, workflow specifications for the Reproducible Research Data Analysis Platform (REANA). The Workflow Templates Section provides further information about templates and their syntax. These templates are used by flowServ to run workflows and to maintain benchmark results.

The flowServ API defines the main interface to programmatically interact with the underlying database and workflow engine. The API implementation that is included in this repository provides a default serialization of all API resources as Python dictionaries. The API is intended to be used by Web applications. These applications can be build using different frameworks. The current default Web API implementation for ROB uses the Flask web framework.

ROB currently provides two different interfaces to interact with a Web API: the Command Line Client and the Web User Interface. See the respective repositories for further information on how to install and use these interfaces.

For an overview of ROB there are slides from the ROB Demo at the Moore-Sloan Data Science Environment’s annual summit 2019 and our presentation at the Analysis Systems Topical Workshop.

The full documentation is also available on readthedocs.io.

Note

flowServ originated from the Reproducible Open Benchmarks for Data Analysis Platform (ROB). This repository replaces Workflow Templates and the Reproducible Benchmark Engine from an earlier version of ROB.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flowserv-core-0.9.4.tar.gz (187.9 kB view details)

Uploaded Source

Built Distribution

flowserv_core-0.9.4-py3-none-any.whl (260.8 kB view details)

Uploaded Python 3

File details

Details for the file flowserv-core-0.9.4.tar.gz.

File metadata

  • Download URL: flowserv-core-0.9.4.tar.gz
  • Upload date:
  • Size: 187.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.4

File hashes

Hashes for flowserv-core-0.9.4.tar.gz
Algorithm Hash digest
SHA256 0cc5128ffdf4e83c5042dc28816b2477085e5d2c4fc9f16873c82a9e72e51685
MD5 1fcc4a0a416b2294ac625f833ea9c684
BLAKE2b-256 feaef3d19a9c15349aef66a157bcdd7fcf5c6369737d5980bf8229ae165b0025

See more details on using hashes here.

File details

Details for the file flowserv_core-0.9.4-py3-none-any.whl.

File metadata

File hashes

Hashes for flowserv_core-0.9.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e9c37b4cf89dd20726c1bf2353ea811e58ada40ee471c1f33a74c9cb9d1a1d1f
MD5 fd68c9bb7f9076da2e8c5609d96cd74f
BLAKE2b-256 61338955b161fcb17e8be973495d0a8aa884c3f13b2159647b55ddffa240cf3f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page