atom-ml·PyPI

A Python AutoML tool for fast exploration and experimentation of supervised machine learning pipelines.

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

ATOM

Automated Tool for Optimized Modelling

Author: tvdboom
Email: m.524687@gmail.com

Description

There is no magic formula in data science that can tell us which type of machine learning algorithm will perform best for a specific use-case. Different models are better suited for different types of data and different problems. At best, you can follow some rough guide on how to approach problems with regard to which model to try on your data, but these are often more confusing than helpful. Best practices tell us to start with a simple model (e.g. linear regression) and build up to more complicated models (e.g. logistic regression -> random forest -> multilayer perceptron) if you are not satisfied with the results. Unfortunately, different models require different data cleaning steps, different type/amount of features, tuning a new set of hyperparameters, etc. Refactoring the code for this purpose can be quite boring and time consuming. Because of this, many data scientists end up just using the model best known to them and fine-tuning this particular model without ever trying different ones. This can result in poor performance (because the model is just not the right one for the task) or in poor time management (because you could have achieved a similar performance with a simpler/faster model).

ATOM is here to help us solve these issues. With just a few lines of code, you can perform basic data cleaning steps, select relevant features and compare the performance of multiple models on a given dataset. ATOM should be able to provide quick insights on which algorithms perform best for the task at hand and provide an indication of the feasibility of the ML solution.

It is important to realize that ATOM is not here to replace all the work a data scientist has to do before getting his model into production. ATOM doesn't spit out production-ready models just by tuning some parameters in its API. After helping you to determine the right model, you will most probably need to fine-tune it using use-case specific features and data cleaning steps in order to achieve maximum performance.

So, this sounds a bit like AutoML, how is ATOM different than auto-sklearn or TPOT? Well, ATOM does AutoML in the sense that it helps you find the best model for a specific task, but contrary to the aforementioned packages, it does not actively search for the best model. It simply runs all of them and let you pick the one that you think suites you best. AutoML packages are often black boxes: if you provide data, it will magically return a working model. Although it works great, they often produce complicated pipelines with low explainability, hard to sell to the business. In this, ATOM excels. Every step of the pipeline is accounted for, and using the provided plotting methods, itâ€™s easy to demonstrate why a model is better/worse than the other.

Example steps taken by ATOM's pipeline:

Data Cleaning
- Handle missing values
- Encode categorical features
- Remove outliers
- Balance the dataset
Feature engineering
- Create new non-linear features
- Remove multi-collinear features
- Remove features with too low variance
- Select the most promising features based on a statistical test
Train and validate multiple models
- Select hyperparameters using a Bayesian Optimization approach
- Train and test the models on the provided data
- Perform bagging to assess the robustness of the output
Analyze the results
- Get the model scores on various metrics
- Make plots to compare the model performances

diagram

Installation

NOTE: Since atom was already taken, download the package under the name `atom-ml`!

Install ATOM's newest release easily via pip:

	$ pip install -U atom-ml

or via conda:

	$ conda install -c conda-forge atom-ml

Usage

Call the ATOMClassifier or ATOMRegressor class and provide the data you want to use:

from sklearn.datasets import load_breast_cancer
from atom import ATOMClassifier

X, y = load_breast_cancer(return_X_y)
atom = ATOMClassifier(X, y, logger='auto', n_jobs=2, verbose=2)

ATOM has multiple data cleaning methods to help you prepare the data for modelling:

atom.impute(strat_num='knn', strat_cat='most_frequent',  min_frac_rows=0.1)  
atom.encode(strategy='Target', max_onehot=8, frac_to_other=0.05)  
atom.feature_selection(strategy='PCA', n_features=12)

Run the pipeline with the models you want to compare:

atom.run(models=['LR', 'LDA', 'XGB', 'lSVM'],
         metric='f1',
         n_calls=25,
         n_initial_points=10,
         bagging=4)

Make plots to analyze the results:

atom.plot_bagging(figsize=(9, 6), filename='bagging_results.png')  
atom.LDA.plot_confusion_matrix(normalize=True, filename='cm.png')

Documentation

For further information about ATOM, please see the project's documentation.

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

6.1.0

Jul 5, 2024

6.0.1

Mar 7, 2024

6.0.0

Mar 7, 2024

5.2.0

Jun 14, 2023

5.1.2

May 7, 2023

5.1.1

Mar 16, 2023

5.1.0

Mar 4, 2023

5.0.1

Nov 29, 2022

5.0.0

Nov 28, 2022

4.14.1

Jul 18, 2022

4.14.0

Jul 17, 2022

4.13.1

Apr 5, 2022

4.13.0

Apr 4, 2022

4.12.0

Feb 24, 2022

4.11.0

Jan 30, 2022

4.10.0

Dec 17, 2021

4.9.1

Oct 30, 2021

4.9.0

Oct 27, 2021

4.8.0

Sep 29, 2021

4.7.3

Sep 11, 2021

4.7.2

Sep 11, 2021

4.7.1

Sep 11, 2021

4.7.0

Sep 10, 2021

4.6.0

Jun 28, 2021

4.5.0

May 31, 2021

4.4.0

Mar 29, 2021

4.3.0

Mar 2, 2021

4.2.1

Dec 29, 2020

4.2.0

Dec 28, 2020

4.1.0

Oct 16, 2020

4.0.1

Sep 29, 2020

This version

4.0.0

Sep 28, 2020

3.3.0

Apr 24, 2020

3.2.0

Mar 30, 2020

3.1.0

Mar 8, 2020

3.0.2

Feb 17, 2020

3.0.1

Feb 15, 2020

3.0.0

Feb 13, 2020

2.4.0

Jan 26, 2020

2.3.0

Dec 13, 2019

2.2.0

Dec 3, 2019

2.1.2

Nov 27, 2019

2.1.1

Nov 22, 2019

2.1.0

Nov 8, 2019

2.0.3

Nov 1, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atom-ml-4.0.0.tar.gz (84.4 kB view details)

Uploaded Sep 28, 2020 Source

File details

Details for the file atom-ml-4.0.0.tar.gz.

File metadata

Download URL: atom-ml-4.0.0.tar.gz
Upload date: Sep 28, 2020
Size: 84.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.5.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.3

File hashes

Hashes for atom-ml-4.0.0.tar.gz
Algorithm	Hash digest
SHA256	`5b60aa3b6fa154dcef14ddcacfca74af0895f5210c15ce88f3ed4105e52a5674`
MD5	`a5c3418795f451eb39df5835c5e238c9`
BLAKE2b-256	`72ff36a1441ba9898b2391473673d5cebdd56f89e400085ae03853a2d9ff6348`

See more details on using hashes here.

atom-ml 4.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Automated Tool for Optimized Modelling

Description

Installation

Usage

Documentation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes