Skip to main content

Fully automated end to end machine learning pipeline

Project description

Amplo - AutoML (for Machine Data)

image PyPI - License

Welcome to the Automated Machine Learning package amplo. Amplo's AutoML is designed specifically for machine data and works very well with tabular time series data (especially unbalanced classification!).

Though this is a standalone Python package, Amplo's AutoML is also available on Amplo's Smart Maintenance Platform. With a graphical user interface and various data connectors, it is the ideal place for service engineers to get started on Predictive.

Amplo's AutoML Pipeline contains the entire Machine Learning development cycle, including exploratory data analysis, data cleaning, feature extraction, feature selection, model selection, hyperparameter optimization, stacking, version control, production-ready models and documentation. It comes with additional tools such as interval analysers, drift detectors, data quality checks, etc.

1. Downloading Amplo

The easiest way is to install our Python package through PyPi:

pip install amplo

2. Usage

Usage is very simple with Amplo's AutoML Pipeline.

from amplo import Pipeline
from sklearn.datasets import make_classification
from sklearn.datasets import make_regression

x, y = make_classification()
pipeline = Pipeline()
pipeline.fit(x, y)
yp = pipeline.predict_proba(x)

x, y = make_regression()
pipeline = Pipeline()
pipeline.fit(x, y)
yp = pipeline.predict(x)

3. Amplo AutoML Features

Interval Analyser

from amplo.automl import IntervalAnalyser

Interval Analyser for Log file classification. When log files have to be classified, and there is not enough data for time series methods (such as LSTMs, ROCKET or Weasel, Boss, etc.), one needs to fall back to classical machine learning models which work better with lower samples. This raises the problem of which samples to classify. You shouldn't just simply classify on every sample and accumulate, that may greatly disrupt classification performance. Therefore, we introduce this interval analyser. By using an approximate K-Nearest Neighbors algorithm, one can estimate the strength of correlation for every sample inside a log. Using this allows for better interval selection for classical machine learning models.

To use this interval analyser, make sure that your logs are located in a folder of their class, with one parent folder with all classes, e.g.:

+-- Parent Folder
|   +-- Class_1
|       +-- Log_1.*
|       +-- Log_2.*
|   +-- Class_2
|       +-- Log_3.*

Data Processing

from amplo.automl import DataProcessor

Automated Data Cleaning:

  • Infers & converts data types (integer, floats, categorical, datetime)
  • Reformats column names
  • Removes duplicates columns and rows
  • Handles missing values by:
    • Removing columns
    • Removing rows
    • Interpolating
    • Filling with zero's
  • Removes outliers using:
    • Clipping
    • Z-score
    • Quantiles
  • Removes constant columns

Feature Processing

from amplo.automl import FeatureProcessor

Automatically extracts and selects features. Removes Co-Linear Features. Included Feature Extraction algorithms:

  • Multiplicative Features
  • Dividing Features
  • Additive Features
  • Subtractive Features
  • Trigonometric Features
  • K-Means Features
  • Lagged Features
  • Differencing Features
  • Inverse Features
  • Datetime Features

Included Feature Selection algorithms:

  • Random Forest Feature Importance (Threshold and Increment)
  • Predictive Power Score

Sequencing

from amplo.automl import Sequencer

For time series regression problems, it is often useful to include multiple previous samples instead of just the latest. This class sequences the data, based on which time steps you want included in the in- and output. This is also very useful when working with tensors, as a tensor can be returned which directly fits into a Recurrent Neural Network.

Modelling

from amplo.automl import Modeller

Runs various regression or classification models. Includes:

  • Scikit's Linear Model
  • Scikit's Random Forest
  • Scikit's Bagging
  • Scikit's GradientBoosting
  • Scikit's HistGradientBoosting
  • DMLC's XGBoost
  • Catboost's Catboost
  • Microsoft's LightGBM
  • Stacking Models

Grid Search

from amplo.grid_search import OptunaGridSearch

Contains three hyperparameter optimizers with extended predefined model parameters:

  • Optuna's Tree-Parzen-Estimator

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Amplo-0.17.0.tar.gz (104.3 kB view details)

Uploaded Source

Built Distribution

Amplo-0.17.0-py3-none-any.whl (142.4 kB view details)

Uploaded Python 3

File details

Details for the file Amplo-0.17.0.tar.gz.

File metadata

  • Download URL: Amplo-0.17.0.tar.gz
  • Upload date:
  • Size: 104.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for Amplo-0.17.0.tar.gz
Algorithm Hash digest
SHA256 f3744b62d5e2db979b05040af6db4b09e86cbd2d37652c9449f8a9ed1ce74c26
MD5 534ea089ead20cc21926be044afcce44
BLAKE2b-256 4a40f178bed9ff3276ccb073ca265efd1672b8901bcb6a16dedd489f8ebf1e84

See more details on using hashes here.

Provenance

File details

Details for the file Amplo-0.17.0-py3-none-any.whl.

File metadata

  • Download URL: Amplo-0.17.0-py3-none-any.whl
  • Upload date:
  • Size: 142.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for Amplo-0.17.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6e9d48334c7bd15a82ede285eaff2abd9da3525bf9c31662c5e2ef245ce29178
MD5 35fcf8f59cc8303f62776473f86d8bb5
BLAKE2b-256 b21ee275dd209307eed109637a6f1d642eb7685be4d51d6e0a766856ca9dbbeb

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page