A Python AutoML tool for fast exploration and experimentation of supervised machine learning pipelines.
Project description
Automated Tool for Optimized Modelling
Author: tvdboom
Email: m.524687@gmail.com
Description
Automated Tool for Optimized Modelling (ATOM) is a python package designed for fast exploration and experimentation of supervised machine learning tasks. With just a few lines of code, you can perform basic data cleaning steps, feature selection and compare the performance of multiple models on a given dataset. ATOM should be able to provide quick insights on which algorithms perform best for the task at hand and provide an indication of the feasibility of the ML solution. This package supports binary classification, multiclass classification, and regression tasks.
NOTE: A data scientist with domain knowledge can outperform ATOM if he applies usecase-specific feature engineering or data cleaning steps! |
---|
Possible steps taken by the ATOM pipeline:
- Data Cleaning
- Handle missing values
- Encode categorical features
- Balance the dataset
- Remove outliers
- Perform feature selection
- Remove features with too high collinearity
- Remove features with too low variance
- Select best features according to a chosen strategy
- Fit all selected models (either direct or via successive halving)
- Select hyperparameters using a Bayesian Optimization approach
- Perform bagging to assess the robustness of the model
- Analyze the results using the provided plotting functions!
Installation
Intall ATOM easily using pip
.
NOTE: Since atom was already taken, the name of the package in pypi is atom-ml ! |
---|
pip install atom-ml
Usage
Call the ATOMClassifier
or ATOMRegressor
class and provide the data you want to use:
from atom import ATOMClassifier
atom = ATOMClassifier(X, y, log='auto', n_jobs=2, verbose=2)
ATOM has multiple data cleaning methods to help you prepare the data for modelling:
atom.impute(strat_num='knn', strat_cat='most_frequent', max_frac_rows=0.1)
atom.encode(max_onehot=10, frac_to_other=0.05)
atom.outliers(max_sigma=4)
atom.balance(oversample=0.8, n_neighbors=15)
atom.feature_selection(strategy='univariate', solver='chi2', max_features=0.9)
Run the pipeline with different models:
atom.pipeline(models=['LR', 'LDA', 'XGB', 'lSVM'],
metric='f1',
max_iter=10,
max_time=1000,
init_points=3,
cv=4,
bagging=10)
Make plots and analyze results:
atom.plot_bagging(filename='bagging_results.png')
atom.lSVM.plot_probabilities()
atom.lda.plot_confusion_matrix()
Documentation
For further information about ATOM, please see the project documentation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.