Skip to main content

A data flow-oriented framework for industrial ML applications

Project description

ml4proflow - A data flow oriented framework for ml applications in industry

ml4proflow is the acronym for Dataflow of Machine Learning for Production and Production Systems.

It is a framework that manages the dataflow along the typical ml pipeline and is mainly used in industrial applications. The ml pipeline is decribed by a graph, the DataFlowGraph. Typically, graphs describing ml pipelines are simple directed. However, since other types of graphs are possible, this frameworks can produce a dataflow for all types of graphs. Following the typical description of a graph, a DFG consits of multiple nodes, which are connected along their edges. Nodes are represented by modules and edges by channels.

Modules contain the ml algorithms and are implemented by experts. Channels are created by the DFG and controlled by a configuration given by an non-expert.

This enables all employees, from the machine operator to data scientists, to execute ml algorithms. Due to the framework's independence of execution platforms and execution architecture, it can be deployed anywhere in the production process, from edge devices to internal or public clouds.

Tests Status Coverage Status Flake8 Status mypy errors mypy strict errors

Features

  • Open Source
  • Python based
  • Modular & scalable
  • Platform independent

Installation

Binary Installer from PyPi

The binaries are available from the Python Package Index. Install this package with

pip install ml4proflow

End User Installation

As ml4proflow is intended for developers and end users, we provide an installation script that sets up all the necessary dependencies for your operating system. It installs a Python instance and all available modules for the framework in a virtual environment. This installation method is intended for end users who are not familiar with Python. This entry point to the framework is located inside the repository ml4proflow-standalone. Follow the steps given by the README.

Installation from source

The source code is currently hosted on Gitlab.

Linux

git clone https://gitlab.ub.uni-bielefeld.de/ml4proflow/ml4proflow
cd ml4proflow
pip install .

Windows

git clone https://gitlab.ub.uni-bielefeld.de/ml4proflow/ml4proflow
cd ml4proflow
pip install .

Development installation

For further development, install the package in editable mode:

pip install -e . 

Usage

A DataFlowGraph is controlled by a Configuration-File. The config.json is structured through a list of all appearing modules in the data pipeline. Every module is described by the path, the name and the configuration of the module.

In most cases the order of execution is determined by the data flow defined through the DFG-configuration. But since modules can decide for themselves whether they want to be executed (e.g. executables),it is necessary to arrange the modules in an intuitive way according to the order in the DFG.

Example DFG-Config

{
    "modules": [{
        "module_ident": "ml4proflow.mods.xxx.modules", 
        "module_name": "ModuleName", 
        "module_config": {
            "channels_pull": ["src"]
            "channels_push": ["src"], 
            "moduleParam1": "xxx",
            "moduleParam2": 1.0
            }
        }]
}

CLI - Interface

$ ml4proflow-cli --[Options]

For more documentation, see here.

Using ml4proflow for data analytics

Basic Principles

  • DataFlow in ml pipelnie represented as graph --> DataFlowGraph
  • A node (Modules) in the graph is created by the DataFlowGraph
  • Nodes can have none to multiple inputs and outputs
    • BasicModule : The basic class of the framework
      • 0-n inputs, 0-m outputs
    • Sources: Inherits from the BasicModule
      • 0 inputs, m outputs
    • Sinks: Inherits from the BasicModule
      • n inputs, 0 outputs
    • Executable: Inherits from the BasicModule
      • 0 inputs, 0 outputs
    • Modules: Inherits from Sinks & Sources
      • n inputs, m outputs
    • DFG: Inherits from Executables
      • 0 inputs, 0 outputs
  • An edge between two nodes is created by the DataFlowManager
    • create_channel (left side of edge = SourceModule)
    • register_sink (right side of edge = SinkModule)
  • Important: Everything is a node: Even a a complete graph can be a node of another graph

version: 1.1

Project details


Release history Release notifications | RSS feed

This version

1.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ml4proflow-1.1.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

ml4proflow-1.1-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file ml4proflow-1.1.tar.gz.

File metadata

  • Download URL: ml4proflow-1.1.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for ml4proflow-1.1.tar.gz
Algorithm Hash digest
SHA256 35596c5aad0f70297c45f348e76c99e69019cfe595816984245bc1e5de2a0173
MD5 655a6207f964acbf47a339c138c4f53f
BLAKE2b-256 3cc1db613f2236f93f541fd44d4fa522d5b23ac8d7f7cbc1d31d92916b5d04bc

See more details on using hashes here.

File details

Details for the file ml4proflow-1.1-py3-none-any.whl.

File metadata

  • Download URL: ml4proflow-1.1-py3-none-any.whl
  • Upload date:
  • Size: 11.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for ml4proflow-1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b13c3c5cd346a04014d70063930b136b7cfee2a72d9999d02291765310fdecec
MD5 d6db0b8bfa462e9b2d619bfed3806ce3
BLAKE2b-256 7c4c76934e9163d22d32ed40b2bb8045ef3f23578a7fe21c65a47a7e28fe7faa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page