Skip to main content

An engine for running component based ML pipelines

Project description

README

The mlcomp module is designed to process and execute 'MCenter' complex pipelines, which consists of one or more component chained together such that output of a previous component becomes the input to the next component. Each pipeline has a particular purpose, such as to train a model or generate inferences.

A single pipeline may include component from different languages, such as Python, R and Java.

How to construct a component

Steps

  • Create a folder, whose name corresponds to the component's name (.e.g source_string)

  • Create a component.json file (json format) inside this folder and make sure to fill in all the following fields:

      {
          "engineType": "Python",
          "language": "Python",
          "userStandalone": false,
          "name": "<Component name (.e.g string_source)>",
          "label": "<A lable that is displayed in the UI>",
          "version": "<Component's version (e.g. 1.0.0)>",
          "group": "<One of the valid groups (.e.g "Connectors")>,
          "program": "<The Python component main script (.e.g string_source.py)>",
          "componentClass": "<The component class name (.e.g StringSource)
          "useMLStats": <true|false - whether the components uses mlstats>,
          "inputInfo": [
              {
               "description": "<Description>",
               "label": "<Lable name>",
               "defaultComponent": "",
               "type": "<A type used to verify matching connected legs>,
               "group": "<data|model|prediction|statistics|other>"
              },
              {...}
          ],
          "outputInfo": [
              <Same as inputInfo above>
          ],
          "arguments": [
              {
                  "key": "<Unique argument key name>",
                  "type": "int|long|float|str|bool",
                  "label": "<A lable that is displayed in the UI>",
                  "description": "<Description>",
                  "optional": <true|false>
              }
          ]
      }
    
  • Create the main component script, which contains the component's class name. This class should inherit from a 'Component' base class, which is taken from parallelm.components.component. The class must implement the materialize function, with this prototype: def _materialize(self, parent_data_objs, user_data). Here is a complete self contained example:

      from parallelm.components import ConnectableComponent
      from parallelm.mlops import mlops
    
    
      class StringSource(ConnectableComponent):
          def __init__(self, engine):
              super(self.__class__, self).__init__(engine)
    
          def _materialize(self, parent_data_objs, user_data):
              self._logger.info("Inside string source component")
              str_value = self._params.get('value', "default-string-value")
    
              mlops.set_stat("Specific stat title", 1.0)
              mlops.set_stat("Specific stat title", 2.0)
    
              return [str_value]
    

    Notes:

    • A component can use self._logger object to print logs.
    • A component may access to pipeline parameters via self._params dictionary.
    • The _materialize function should return a list of objects or None otherwise. This returned value will be used as an input for the next component in the pipeline chain.
  • Place the components main program (*.py) inside a folder along with its json description file and any other desired files.

How to construct a pipeline

Steps

  • Open any text editor and copy the following template:

      {
          "name": "Simple MCenter runner test",
          "engineType": "Python",
          "pipe": [
              {
                  "name": "Source String",
                  "id": 1,
                  "type": "string-source",
                  "parents": [],
                  "arguments": {
                      "value": "Hello World: testing string source and sink"
                  }
              },
              {
                  "name": "Sink String",
                  "id": 2,
                  "type": "string-sink",
                  "parents": [{"parent": 1, "output": 0}],
                  "arguments": {
                      "expected-value": "Hello World: testing string source and sink"
                  }
              }
          ]
      }
    

    Notes:

    • It is assumed that you've already constructed two components whose names are: string-source and string-sink
    • The output of string-source component (the value returned from _materialize function) is supposed to become the input of string-sink component (an input to the _materialize function)
  • Save it with any desired name

How to test

Once the ml-comp python package is installed, a command line mlpiper is installed and can be used to execute the pipeline above and the components described in it.

There three main commnads that can be used as follows:

  • deploy - deploys a pipeline along with provided components into a given folder. Once deployed, it can also be executed directly from the given folder.

  • run - deploys and executes the pipeline at once.

  • run-deployment - executes an already deployed pipeline.

Examples:

  • Prepare a deployment. The resulted dirbe copied to a docker container and run there

    mlpiper -r ~/dev/components deploy -p p1.json -d /tmp/pp
    
  • Deploy & Run. Usefull for development and debugging

    mlpiper -r ~/dev/components run -p p1.json -d /tmp/pp
    
  • Run a deployment. Usually non interactive called by another script

    mlpiper run-deployment --deployment-dir /tmp/pp
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ml-comp-1.1.1.tar.gz (46.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ml_comp-1.1.1-py3-none-any.whl (78.9 kB view details)

Uploaded Python 3

ml_comp-1.1.1-py2-none-any.whl (78.9 kB view details)

Uploaded Python 2

File details

Details for the file ml-comp-1.1.1.tar.gz.

File metadata

  • Download URL: ml-comp-1.1.1.tar.gz
  • Upload date:
  • Size: 46.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.18.4 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.5

File hashes

Hashes for ml-comp-1.1.1.tar.gz
Algorithm Hash digest
SHA256 a32c491b204438214f24db800b7ae42a35e1668b1786ff294a08d02f0307ef33
MD5 d88e75665836bb8f7023b2f2eacd7df8
BLAKE2b-256 63ba5a54e9b2b4bdbf6915bf78116a4077773d4958243ed9b5090cc503f45c50

See more details on using hashes here.

File details

Details for the file ml_comp-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: ml_comp-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 78.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.18.4 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.5

File hashes

Hashes for ml_comp-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 87eef6b3560872da71a97ef29d3c398bd65374badaebfb39f8ebb607b864fe2b
MD5 266da77c874d2b5efb7312d8f6382929
BLAKE2b-256 c988c8798c1bcf2c613645dab0113093f136679fe70955619faf958a42a8ada5

See more details on using hashes here.

File details

Details for the file ml_comp-1.1.1-py2-none-any.whl.

File metadata

  • Download URL: ml_comp-1.1.1-py2-none-any.whl
  • Upload date:
  • Size: 78.9 kB
  • Tags: Python 2
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.18.4 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.5

File hashes

Hashes for ml_comp-1.1.1-py2-none-any.whl
Algorithm Hash digest
SHA256 a4e52a916aef2bf68bb5d913658bd5058757cb35a5f9649b0fb5e2f590a9ed1b
MD5 7f9c598a8c5f6884f4183c70f146a6d3
BLAKE2b-256 baab67309457808cd077ea745737e58593ecf96877238acb721325a9a541e00e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page