Skip to main content

Fully automated end to end machine learning pipeline

Project description

Amplo - AutoML (for Machine Data)

image image PyPI - License

Welcome to the Automated Machine Learning package Amplo. Amplo's AutoML is designed specifically for machine data and works very well with tabular time series data (especially unbalanced classification!).

Though this is a standalone Python package, Amplo's AutoML is also available on Amplo's ML Developer Platform. With a graphical user interface and various data connectors, it is the ideal place for service engineers to get started on Predictive Maintenance development.

Amplo's AutoML Pipeline contains the entire Machine Learning development cycle, including exploratory data analysis, data cleaning, feature extraction, feature selection, model selection, hyper parameter optimization, stacking, version control, production-ready models and documentation.

Downloading Amplo

The easiest way is to install our Python package through PyPi:

pip install Amplo

2. Amplo AutoML Features

Exploratory Data Analysis

from Amplo.AutoML import DataExploring Automated Exploratory Data Analysis. Covers binary classification and regression. It generates:

  • Missing Values Plot
  • Line Plots of all features
  • Box plots of all features
  • Co-linearity Plot
  • SHAP Values
  • Random Forest Feature Importance
  • Predictive Power Score

Additionally fFor Regression:

  • Seasonality Plots
  • Differentiated Variance Plot
  • Auto Correlation Function Plot
  • Partial Auto Correlation Function Plot
  • Cross Correlation Function Plot
  • Scatter Plots

Data Processing

from Amplo.AutoML import DataProcessing Automated Data Cleaning. Handles the following items:

  • Cleans Column Names
  • Duplicate Columns and Rows
  • Data Types
  • Missing Values
  • Outliers
  • Constant Columns

Feature Processing

from Amplo.AutoML import FeatureProcessing Automatically extracts and selects features. Removes Co-Linear Features. Included Feature Extraction algorithms:

  • Multiplicative Features
  • Dividing Features
  • Additive Features
  • Subtractive Features
  • Trigonometric Features
  • K-Means Features
  • Lagged Features
  • Differencing Features

Included Feature Selection algorithms:

  • Random Forest Feature Importance (Threshold and Increment)
  • Predictive Power Score
  • Boruta

Sequencing

from Amplo.AutoML import Sequence For timeseries regression problems, it is often useful to include multiple previous samples instead of just the latest. This class sequences the data, based on which time steps you want included in the in- and output. This is also very useful when working with tensors, as a tensor can be returned which directly fits into a Recurrent Neural Network.

Modelling

from Amplo.AutoML import Modelling Runs various regression or classification models. Includes:

  • Scikit's Linear Model
  • Scikit's Random Forest
  • Scikit's Bagging
  • Scikit's GradientBoosting
  • Scikit's HistGradientBoosting
  • DMLC's XGBoost
  • Catboost's Catboost
  • Microsoft's LightGBM

Grid Search

from Amplo.GridSearch import * Contains three hyperparameter optimizers, a basic GridSearch, an implementation of Scikit's RandomHalvingSearch and an implementation of Optuna's Tree-structured Parzen Estimator. Generally we advice to use Optuna.

Automatic Documntation

from Amplo.AutoML import Documenting Contains a documenter for classification (binary and multiclass prolems), as well as for regression. Creates a pdf report for a Pipeline, including metrics, data processing steps, and everything else to recreate the result.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Amplo-0.4.0.tar.gz (66.7 kB view details)

Uploaded Source

Built Distribution

Amplo-0.4.0-py3-none-any.whl (89.4 kB view details)

Uploaded Python 3

File details

Details for the file Amplo-0.4.0.tar.gz.

File metadata

  • Download URL: Amplo-0.4.0.tar.gz
  • Upload date:
  • Size: 66.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for Amplo-0.4.0.tar.gz
Algorithm Hash digest
SHA256 e19f5d4ad4846ff8846552e4f0d6dd6c674469dc38c5239bb4d3127314db03b4
MD5 1afcfae684af611cf74ea6e27a4ebdd9
BLAKE2b-256 c382d2f4305edcdb541623b9fe2fc98b652930fa9a017e3c67318e4e0c906679

See more details on using hashes here.

File details

Details for the file Amplo-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: Amplo-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 89.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for Amplo-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 072cc036d57bd7486b005456ea93bdccc214fd028ca887530a16931c5cfbae3c
MD5 6009202b9e40bab1a9f04945ecb0d911
BLAKE2b-256 9d34384b737aef57812fc7c13751f611a8c02b7ab1162db64a9d45c2fdf65b01

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page