A data flow-oriented framework for industrial ML applications
Project description
ml4proflow - A data flow oriented framework for ml applications in industry
ml4proflow is the acronym for Dataflow of Machine Learning for Production and Production Systems.
It is a framework that manages the dataflow along the typical ml pipeline and is mainly used in industrial applications. The ml pipeline is decribed by a graph, the DataFlowGraph. Typically, graphs describing ml pipelines are simple directed. However, since other types of graphs are possible, this frameworks can produce a dataflow for all types of graphs. Following the typical description of a graph, a DFG consits of multiple nodes, which are connected along their edges. Nodes are represented by modules and edges by channels.
Modules contain the ml algorithms and are implemented by experts. Channels are created by the DFG and controlled by a configuration given by an non-expert.
This enables all employees, from the machine operator to data scientists, to execute ml algorithms. Due to the framework's independence of execution platforms and execution architecture, it can be deployed anywhere in the production process, from edge devices to internal or public clouds.
Features
- Open Source
- Python based
- Modular & scalable
- Platform independent
Installation
Binary Installer from PyPi
The binaries are available from the Python Package Index. Install this package with
pip install ml4proflow
End User Installation
As ml4proflow is intended for developers and end users, we provide an installation script that sets up all the necessary dependencies for your operating system. It installs a Python instance and all available modules for the framework in a virtual environment. This installation method is intended for end users who are not familiar with Python. This entry point to the framework is located inside the repository ml4proflow-standalone. Follow the steps given by the README.
Installation from source
The source code is currently hosted on Gitlab.
Linux
git clone https://gitlab.ub.uni-bielefeld.de/ml4proflow/ml4proflow
cd ml4proflow
pip install .
Windows
git clone https://gitlab.ub.uni-bielefeld.de/ml4proflow/ml4proflow
cd ml4proflow
pip install .
Development installation
For further development, install the package in editable mode:
pip install -e .
Usage
A DataFlowGraph is controlled by a Configuration-File. The config.json
is structured through a list of all appearing modules in the data pipeline. Every module is described by the path, the name and the configuration of the module.
In most cases the order of execution is determined by the data flow defined through the DFG-configuration. But since modules can decide for themselves whether they want to be executed (e.g. executables),it is necessary to arrange the modules in an intuitive way according to the order in the DFG.
Example DFG-Config
{
"modules": [{
"module_ident": "ml4proflow.mods.xxx.modules",
"module_name": "ModuleName",
"module_config": {
"channels_pull": ["src"]
"channels_push": ["src"],
"moduleParam1": "xxx",
"moduleParam2": 1.0
}
}]
}
CLI - Interface
$ ml4proflow-cli --[Options]
For more documentation, see here.
Using ml4proflow for data analytics
Basic Principles
- DataFlow in ml pipelnie represented as graph --> DataFlowGraph
- A node (Modules) in the graph is created by the DataFlowGraph
- Nodes can have none to multiple inputs and outputs
- BasicModule : The basic class of the framework
- 0-n inputs, 0-m outputs
- Sources: Inherits from the BasicModule
- 0 inputs, m outputs
- Sinks: Inherits from the BasicModule
- n inputs, 0 outputs
- Executable: Inherits from the BasicModule
- 0 inputs, 0 outputs
- Modules: Inherits from Sinks & Sources
- n inputs, m outputs
- DFG: Inherits from Executables
- 0 inputs, 0 outputs
- BasicModule : The basic class of the framework
- An edge between two nodes is created by the DataFlowManager
create_channel
(left side of edge = SourceModule)register_sink
(right side of edge = SinkModule)
- Important: Everything is a node: Even a a complete graph can be a node of another graph
version: 1.1
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ml4proflow-1.1.tar.gz
.
File metadata
- Download URL: ml4proflow-1.1.tar.gz
- Upload date:
- Size: 13.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 35596c5aad0f70297c45f348e76c99e69019cfe595816984245bc1e5de2a0173 |
|
MD5 | 655a6207f964acbf47a339c138c4f53f |
|
BLAKE2b-256 | 3cc1db613f2236f93f541fd44d4fa522d5b23ac8d7f7cbc1d31d92916b5d04bc |
File details
Details for the file ml4proflow-1.1-py3-none-any.whl
.
File metadata
- Download URL: ml4proflow-1.1-py3-none-any.whl
- Upload date:
- Size: 11.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b13c3c5cd346a04014d70063930b136b7cfee2a72d9999d02291765310fdecec |
|
MD5 | d6db0b8bfa462e9b2d619bfed3806ce3 |
|
BLAKE2b-256 | 7c4c76934e9163d22d32ed40b2bb8045ef3f23578a7fe21c65a47a7e28fe7faa |