Skip to main content

MLBlocks Primitives

Project description

“DAI-Lab” An open source project from Data to AI Lab at MIT.

PyPi Shield Travis CI Shield

MLPrimitives

MLBlocks Primitives

Overview

This repository contains JSON primitives to be used by the MLBlocks library, as well as the necessary Python code to make some of them fully compatible with the MLBlocks API requirements.

There is also a collection of custom primitives contributed directly to this library, which either combine third party tools or implement new functionalities from scratch.

Project Structure

The project is divided in three parts:

The mlblocks_primitives folder

The mlblocks_primitives folder contains the JSON annotations for the MLBlocks primitives.

This folder has a flat structure, without subfolders, and all the primitive JSONs are named after the Fully Qualified Name of the annotated primitive (function or class).

As a result of this, sorting the JSON files alphabetically shows them grouped by library, which makes browsing them and seeing what tools are implemented easy.

The mlprimitives package

The mlprimitives folder is where all the Python code can be found.

Several sub-modules exist inside it, for the different types of primitives implemented, including the mlprimitives.adapters module, which has a special role in the integration of third party tools that do not directly fit the MLBlocks requirements.

The tests folder

Here are the unit tests for the Python code, as well as some validation tests for the JSON annotations.

Primitive Types

Three types of primitives can be found in this repository:

Primitives that can be directly integrated to MLBlocks

The simplest type of primitives are the ones that can be directly integrated to MLBlocks using nothing else than a single JSON annotation file.

These JSON files can be found in the mlblocks_primitives folder, and integrate functions or classes that comply with the following requirements:

  • Tunable hyperparameters are simple values of the supported basic types: str, bool, int or float.
  • Creating the class instance or calling the fit or produce methods does not require building any complex structure before the call is made.
  • The fitting and predicting phase consist on a single method or function call each.

A good example of this type of primitives are most of the estimators from the scikit-learn library.

Primitives that need a Python adapter to be integrated to MLBlocks

The second simplest type of primitives are the ones that need some kind of adaptation process to be integrated to MLBlocks, but whose behaviour is not altered in any way by this process.

These primitives consist of some Python code which can be found in the mlprimitives.adapters module, as well as JSON annotations that point at the corresponding functions or classes, which can be found in the mlblocs_primitives folder.

The type of primitives that are integrated in this way are the ones that have some of these characteristics:

  • Need some additional steps after the instantiation in order to be prepared to run.
  • The tunable hyperparameters need some kind of transformation or instantiation before they can be passed to the primitive.
  • The primitive cannot be directly applied to the inputs or the outputs need to be manipulated in some way before they can be passed to any other primitive.

Some examples of these primitives are the Keras models, which need to be built in several steps and later on compiled before they can be used, or some image transformation primitives which need to be applied to the images one by one.

Custom primitives

The third type are custom primitives implemented specifically for this library.

These custom primitives may be using third party tools or implemented from scratch, but if they use third party tools they alter in some way their native behavior to add new functionalities to them.

This type of primitives consist of Python code from the mlprimitives module, as well as the corresponding JSON annotations, which can also be found in the mlblocks_primitives folder.

Contributing

This is a community driven project and all contributions are more than welcome, from simple feedback to the most complex coding contributions.

If you have anything that you want to ask, request or contribute, please check the contributing section in the documentation, and do not hesitate to open GitHub Issue, even if it is to ask a simple question.

History

0.1.4

New Primitives

  • mlprimitives.timeseries primitives for timeseries data preprocessing
  • mlprimitives.timeseres_error primitives for timeseries anomaly detection
  • keras.Sequential.LSTMTimeSeriesRegressor
  • sklearn.neighbors.KNeighbors Classifier and Regressor
  • several sklearn.decomposition primitives
  • several sklearn.ensemble primitives

Bug Fixes

  • Fix typo in mlprimitives.text.TextCleaner primitive
  • Fix bug in index handling in featuretools.dfs primitive
  • Fix bug in SingleLayerCNNImageClassifier annotation
  • Remove old vlaidation tags from JSON annotations

0.1.3

New Features

  • Fix and re-enable featuretools.dfs primitive.

0.1.2

New Features

  • Add pipeline specification language and Evaluation utilities.
  • Add pipelines for graph, text and tabular problems.
  • New primitives ClassEncoder and ClassDecoder
  • New primitives UniqueCounter and VocabularyCounter

Bug Fixes

  • Fix TrivialPredictor bug when working with numpy arrays
  • Change XGB default learning rate and number of estimators

0.1.1

New Features

  • Add more keras.applications primitives.
  • Add a Text Cleanup primitive.

Bug Fixes

  • Add keywords to keras.preprocessing primtives.
  • Fix the image_transform method.
  • Add epoch as a fixed hyperparameter for keras.Sequential primitives.

0.1.0

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlprimitives-0.1.4.tar.gz (46.8 kB view hashes)

Uploaded Source

Built Distribution

mlprimitives-0.1.4-py2.py3-none-any.whl (85.2 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page