Skip to main content

A Python AutoML tool for fast exploration and experimentation of supervised machine learning pipelines.

Project description

ATOM

Automated Tool for Optimized Modelling

Author: tvdboom
Email: m.524687@gmail.com

Project Status: Active Build Status codecov Language grade: Python Python 3.6|3.7|3.8 License: MIT PyPI version

Description

Automated Tool for Optimized Modelling (ATOM) is a python package designed for fast exploration and experimentation of supervised machine learning tasks. With just a few lines of code, you can perform basic data cleaning steps, feature selection and compare the performance of multiple models on a given dataset. ATOM should be able to provide quick insights on which algorithms perform best for the task at hand and provide an indication of the feasibility of the ML solution. This package supports binary classification, multiclass classification, and regression tasks.

NOTE: A data scientist with domain knowledge can outperform ATOM if he applies usecase-specific feature engineering or data cleaning steps!

Possible steps taken by the ATOM pipeline:

  1. Data Cleaning
    • Handle missing values
    • Encode categorical features
    • Balance the dataset
    • Remove outliers
  2. Perform feature selection
    • Remove features with too high collinearity
    • Remove features with too low variance
    • Select best features according to a chosen strategy
  3. Fit all selected models (either direct or via successive halving)
    • Select hyperparameters using a Bayesian Optimization approach
    • Perform bagging to assess the robustness of the model
  4. Analyze the results using the provided plotting functions!



diagram

Installation

Intall ATOM easily using pip.

NOTE: Since atom was already taken, the name of the package in pypi is atom-ml!
	pip install atom-ml

Usage

Call the ATOMClassifier or ATOMRegressor class and provide the data you want to use:

from sklearn.datasets import load_breast_cancer  
from atom import ATOMClassifier 

X, y = load_breast_cancer(return_X_y)
atom = ATOMClassifier(X, y, log='auto', n_jobs=2, verbose=2)

ATOM has multiple data cleaning methods to help you prepare the data for modelling:

atom.impute(strat_num='knn', strat_cat='most_frequent',  min_frac_rows=0.7)  
atom.encode(max_onehot=10, frac_to_other=0.05)  
atom.outliers(max_sigma=4)  
atom.balance(oversample=0.8, n_neighbors=15)  
atom.feature_selection(strategy='univariate', solver='chi2', n_features=0.9)

Run the pipeline with different models:

atom.pipeline(models=['LR', 'LDA', 'XGB', 'lSVM'],
              metric='f1',
              max_iter=10,
              max_time=1000,
              init_points=3,
              cv=4,
              bagging=10)  

Make plots and analyze results:

atom.plot_bagging(filename='bagging_results.png')  
atom.lSVM.plot_probabilities(figsize=(9, 6))  
atom.lda.plot_confusion_matrix(normalize=True)

Documentation

For further information about ATOM, please see the project documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for atom-ml, version 3.2.0
Filename, size File type Python version Upload date Hashes
Filename, size atom-ml-3.2.0.tar.gz (51.0 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page