Skip to main content

An engine for running component based ML pipelines

Project description

README

The mlcomp module is designed to process and execute complex pipelines, that consist of one or more components chained together such that output of a previous component becomes the input to the next component. Each pipeline has a particular purpose, such as to train a model or generate inferences.

A single pipeline may include components from different languages, such as Python, R and Java.

Quickstart

Steps

  • Create a pipeline. Open any text editor and copy the following pipeline description:

      {
          "name": "Simple MCenter runner test",
          "engineType": "Generic",
          "pipe": [
              {
                  "name": "Source String",
                  "id": 1,
                  "type": "string-source",
                  "parents": [],
                  "arguments": {
                      "value": "Hello World: testing string source and sink"
                  }
              },
              {
                  "name": "Sink String",
                  "id": 2,
                  "type": "string-sink",
                  "parents": [{"parent": 1, "output": 0}],
                  "arguments": {
                      "expected-value": "Hello World: testing string source and sink"
                  }
              }
          ]
      }
    
  • Clone mlpiper repo https://github.com/mlpiper/mlpiper/

  • Components string-source and string-sink can be found in the repo path https://github.com/mlpiper/mlpiper/tree/master/reflex-algos/components/Python

  • Once the ml-comp python package is installed, the mlpiper command line tool is available and can be used to execute the above pipeline and the components described in it. Run the example above with:

    mlpiper run -f ~/<pipeline description file> -r <path to mlpiper repo>/reflex-algos/components/Python/ -d <deployment dir>
    

    Use the --force option to overwrite the deployment directory.

How to construct a component

Steps

  • Create a directory, the name of which corresponds to the component's name (e.g., source_string)

  • Create a component.json file (JSON format) inside this directory and make sure to fill in all of the following fields:

      {
          "engineType": "Generic",
          "language": "Python",
          "userStandalone": false,
          "name": "<Component name (e.g., string_source)>",
          "label": "<A lable that is displayed in the UI>",
          "version": "<Component's version (e.g., 1.0.0)>",
          "group": "<One of the valid groups (e.g., "Connectors")>,
          "program": "<The Python component main script (e.g., string_source.py)>",
          "componentClass": "<The component class name (e.g., StringSource)
          "useMLStats": <true|false - (whether the components uses MLStats)>,
          "inputInfo": [
              {
               "description": "<Description>",
               "label": "<Lable name>",
               "defaultComponent": "",
               "type": "<A type used to verify matching connected legs>,
               "group": "<data|model|prediction|statistics|other>"
              },
              {...}
          ],
          "outputInfo": [
              <Same as inputInfo above>
          ],
          "arguments": [
              {
                  "key": "<Unique argument key name>",
                  "type": "int|long|float|str|bool",
                  "label": "<A label that is displayed in the UI>",
                  "description": "<Description>",
                  "optional": <true|false>
              }
          ]
      }
    
  • Create the main component script, which contains the component's class name. This class should inherit from a 'Component' base class, which is taken from parallelm.components.component. The class must implement the materialize function, with this prototype: def _materialize(self, parent_data_objs, user_data). Here is a complete self contained example:

      from parallelm.components import ConnectableComponent
      from parallelm.mlops import mlops
    
    
      class StringSource(ConnectableComponent):
          def __init__(self, engine):
              super(self.__class__, self).__init__(engine)
    
          def _materialize(self, parent_data_objs, user_data):
              self._logger.info("Inside string source component")
              str_value = self._params.get('value', "default-string-value")
    
              mlops.set_stat("Specific stat title", 1.0)
              mlops.set_stat("Specific stat title", 2.0)
    
              return [str_value]
    

    Notes:

    • A component can use self._logger object to print logs.
    • A component may access to pipeline parameters via self._params dictionary.
    • The _materialize function should return a list of objects or None otherwise. This returned value will be used as an input for the next component in the pipeline chain.
  • Place the component's main program (*.py) inside a directory along with its JSON description file and any other desired files.

How to construct a pipeline

Steps

  • Open any text editor and copy the following pipeline description:

      {
          "name": "Simple MCenter runner test",
          "engineType": "Generic",
          "pipe": [
              {
                  "name": "Source String",
                  "id": 1,
                  "type": "string-source",
                  "parents": [],
                  "arguments": {
                      "value": "Hello World: testing string source and sink"
                  }
              },
              {
                  "name": "Sink String",
                  "id": 2,
                  "type": "string-sink",
                  "parents": [{"parent": 1, "output": 0}],
                  "arguments": {
                      "expected-value": "Hello World: testing string source and sink"
                  }
              }
          ]
      }
    

    Notes:

    • It is assumed that you've already constructed two components whose names are: string-source and string-sink
    • The output of the string-source component (the value returned from _materialize function) is supposed to become the input of the string-sink component (an input to the _materialize function)
  • Save it with any desired name

How to test

Once the ml-comp python package is installed, mlpiper command line tool is available and can be used to execute the above pipeline and the components described in it.

There are three main commands that can be used as follows:

  • deploy - Deploys a pipeline along with provided components into a given directory. Once deployed, it can be executed directly from the given directory.

  • run - Deploys and then executes the pipeline.

  • run-deployment - Executes an already-deployed pipeline.

Examples:

  • Prepare a deployment. The resulting directory will be copied to a docker container and run there:

    mlpiper deploy -f p1.json -r ~/dev/components -d /tmp/pp
    
  • Deploy & Run. Useful for development and debugging:

    mlpiper run -f p1.json -r ~/dev/components -d /tmp/pp
    

    Use --force option to overwrite deployment directory

  • Run a deployment. Usually non-interactive and called by another script:

    mlpiper run-deployment --deployment-dir /tmp/pp
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ml-comp-1.3.2.tar.gz (76.6 kB view details)

Uploaded Source

Built Distributions

ml_comp-1.3.2-py3-none-any.whl (117.2 kB view details)

Uploaded Python 3

ml_comp-1.3.2-py2-none-any.whl (117.2 kB view details)

Uploaded Python 2

File details

Details for the file ml-comp-1.3.2.tar.gz.

File metadata

  • Download URL: ml-comp-1.3.2.tar.gz
  • Upload date:
  • Size: 76.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.7.5

File hashes

Hashes for ml-comp-1.3.2.tar.gz
Algorithm Hash digest
SHA256 608f14f76fe51248e97449a3dc331d2c48e1b99e320cc5b15cd5ba40b9adc76a
MD5 9691047954a180e027d89f85d3d97c06
BLAKE2b-256 1942837ca5a18e3b43e3c033718045194c75a0c3ee0ad8219e8104fd56c9e4d9

See more details on using hashes here.

File details

Details for the file ml_comp-1.3.2-py3-none-any.whl.

File metadata

  • Download URL: ml_comp-1.3.2-py3-none-any.whl
  • Upload date:
  • Size: 117.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.7.5

File hashes

Hashes for ml_comp-1.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0d63462342d2ee5025da38b96ad9a10b9bc7e14ed35d8d7aa7781025d897566d
MD5 b5512e108cf384b5890fe30cb9504a37
BLAKE2b-256 f454d4654fdcdb6ea6d3cca5c6ec13c5f9b3ff1bfe43819cb963a53ac29a94c7

See more details on using hashes here.

File details

Details for the file ml_comp-1.3.2-py2-none-any.whl.

File metadata

  • Download URL: ml_comp-1.3.2-py2-none-any.whl
  • Upload date:
  • Size: 117.2 kB
  • Tags: Python 2
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.7.5

File hashes

Hashes for ml_comp-1.3.2-py2-none-any.whl
Algorithm Hash digest
SHA256 bdb41e8c25ccdbefbde964f7ed43c0af0268dc90b95b31e74717880f60e3af89
MD5 fa4feaea1e32a6c05e0edada738d8c87
BLAKE2b-256 fa4802f2088c8e17c1e6843fe5cdbb0b8b6fcd554d96e0b0c0f01a7f2ca97f19

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page