Skip to main content

Prediction engineering methods for Draco.

Project description

DAI-Lab A project from Data to AI Lab at MIT.

GreenGuard Prediction Engineering

Prediction engineering methods for GreenGuard.

Overview

The GreenGuard Prediction Engineering library is a framework designed to assist in the generation of machine learning problems for wind farms operations data by analyzing past occurrences of events.

The main features of GPE are:

  • EntitySet creation: tools designed to represent wind farm data and the relationship between different tables. We have functions to create EntitySets for datasets with PI data and datasets using SCADA data.
  • Labeling Functions: a collection of functions, as well as tools to create custom versions of them, ready to be used to analyze past operations data in the search for occurrences of specific types of events in the past.
  • Prediction Engineering: a flexible framework designed to apply labeling functions on wind turbine operations data in a number of different ways to create labels for custom Machine Learning problems.
  • Feature Engineering: a guide to using Featuretools to apply automated feature engineerinig to wind farm data.

Install

Requirements

GPE has been developed and runs on Python 3.6 and 3.7.

Also, although it is not strictly required, the usage of a virtualenv is highly recommended in order to avoid interfering with other software installed in the system where you are trying to run GPE.

Download and Install

GPE can be installed locally using pip with the following command:

pip install --extra-index-url https://pypi.dailab.ml/ gpe

This will pull and install the latest stable release from the DAI-Lab private PyPi Instance.

If you want to install from source or contribute to the project please read the Contributing Guide.

Docker usage

Alternatively, GPE is prepared to be run inside a docker environment. Please check the docker documentation for details about how to run GPE using docker.

Quickstart

In this short tutorial we will guide you through a series of steps that will help you getting started with GPE.

1. Loading the data

The first step we will be to use preprocessed data to create an EntitySet. Depending on the type of data, we will either the gpe.create_pidata_entityset or gpe.create_scada_entityset functions.

NOTE: if you cloned the GPE repository, you will find some demo data inside the notebooks/data folder which has been preprocessed to fit the create_entityset data requirements.

import os
import pandas as pd
from gpe import create_scada_entityset

data_path = 'notebooks/data'

data = {
  'turbines': pd.read_csv(os.path.join(data_path, 'turbines.csv')),
  'alarms': pd.read_csv(os.path.join(data_path, 'alarms.csv')),
  'work_orders': pd.read_csv(os.path.join(data_path, 'work_orders.csv')),
  'stoppages': pd.read_csv(os.path.join(data_path, 'stoppages.csv')),
  'notifications': pd.read_csv(os.path.join(data_path, 'notifications.csv')),
  'scada': pd.read_csv(os.path.join(data_path, 'scada.csv'))
}

scada_es = create_scada_entityset(data)

This will load the turbine, alarms, stoppages, work order, notifications, and SCADA data, and return it as an EntitySet.

Entityset: SCADA data
  DataFrames:
    turbines [Rows: 1, Columns: 10]
    alarms [Rows: 2, Columns: 9]
    work_orders [Rows: 2, Columns: 20]
    stoppages [Rows: 2, Columns: 16]
    notifications [Rows: 2, Columns: 15]
    scada [Rows: 2, Columns: 5]
  Relationships:
    alarms.COD_ELEMENT -> turbines.COD_ELEMENT
    stoppages.COD_ELEMENT -> turbines.COD_ELEMENT
    work_orders.COD_ELEMENT -> turbines.COD_ELEMENT
    scada.COD_ELEMENT -> turbines.COD_ELEMENT
    notifications.COD_ORDER -> work_orders.COD_ORDER

2. Selecting a Labeling Function

The second step will be to choose an adequate Labeling Function.

We can see the list of available labeling functions using the gpe.labeling.get_labeling_functions function.

from gpe import labeling

labeling.get_labeling_functions()

This will return us a dictionary with the name and a short description of each available function.

{'brake_pad_presence': 'Calculates the total power loss over the data slice.',
 'converter_replacement_presence': 'Calculates the converter replacement presence.',
 'total_power_loss': 'Calculates the total power loss over the data slice.'}

In this case, we will choose the total_power_loss function, which calculates the total amount of power lost over a slice of time.

3. Generate Target Times

Once we have loaded the data and the Labeling Function, we are ready to start using the gpe.generate_labels function to generate a Target Times table.

from gpe import DataLabeler

data_labeler = DataLabeler(labeling.labeling_functions.total_power_loss)
target_times, metadata = data_labeler.generate_label_times(scada_es)

This will return us a compose.LabelTimes containing the three columns required to start working on a Machine Learning problem: the turbine ID (COD_ELEMENT), the cutoff time (time) and the label.

   COD_ELEMENT       time    label
0            0 2022-01-01  45801.0

What's Next?

If you want to continue learning about GreenGuard Prediction Engineering and all its features please have a look at the tutorials found inside the notebooks folder.

History

0.2.3 - 2020-09-20

  • Update test environment and make test commands.

0.2.2 - 2020-02-12

  • Add github actions and perform tests over the readme example.

0.2.1 - 2020-02-12

  • Slight user API improvements.
  • Removal of unused code.
  • Improved documentation and tutorials.
  • Setup to run GPE on a Docker container.

0.2.0 - 2020-02-06

First full release

  • Data Preprocessing
  • Prediction Engineering Framework
  • First Labeling functions

0.1.0 - 2019-10-31

First Pre-Release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zephyr-ml-0.0.0.tar.gz (61.9 kB view details)

Uploaded Source

Built Distribution

zephyr_ml-0.0.0-py2.py3-none-any.whl (14.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file zephyr-ml-0.0.0.tar.gz.

File metadata

  • Download URL: zephyr-ml-0.0.0.tar.gz
  • Upload date:
  • Size: 61.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/35.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.64.0 importlib-metadata/4.11.3 keyring/23.5.1 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.12

File hashes

Hashes for zephyr-ml-0.0.0.tar.gz
Algorithm Hash digest
SHA256 86845a518ec91be5d09aa2e03e1aae06012ae7b0e6024c82ccf2491b677a8387
MD5 ca679c9369dc7f34c04140960558c603
BLAKE2b-256 908a61849be83212ee0a0b00980161b8b00bece7f1956b38eda7715bb2386e85

See more details on using hashes here.

File details

Details for the file zephyr_ml-0.0.0-py2.py3-none-any.whl.

File metadata

  • Download URL: zephyr_ml-0.0.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 14.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/35.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.64.0 importlib-metadata/4.11.3 keyring/23.5.1 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.12

File hashes

Hashes for zephyr_ml-0.0.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 ce77da32eee3818881f5d1d6ba5644a17f815ed9003d8bf5fab351e5bee35bb3
MD5 bac18126db6b2a6eea1516f0c9b0927c
BLAKE2b-256 ac14823e6c65d8667c94b83fc786c74ddecb9ea38eae75dcfb01c078d42be885

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page