Skip to main content

No project description provided

Project description

Introduction to Baseline Optimal

Between the raw data and the optimal results in machine learning projects, there is an exhausting, iterative process. We go back and forth to experiment with various combinations of feature engineering, processing methods, and models along with their hyperparameters. At the end of the day, we hope our efforts pay off.

🤞 God bless data scientists. 🤞

Manual experimentation is a good practice. However, you may have found that we often produce messy, repetitive code throughout the process, and it takes a long while for us to figure out that an attempt doesn't work out. Sometimes we may overcomplicate data transformation and processing to get promising but unncessary metric scores.

Given these problems, the baseline_optimal package automates the workflow by employing Optuna's Bayesian optimization, significantly reducing the need for manual experimentation. You provide the raw data, and the modules do the heavy lifting.


Installation

You can install the baseline_optimal package and its dependencies using pip:

pip install baseline_optimal

After installation, you can import the package in Python:

import baseline_optimal

Documentation

Access the the entire documentation through GitHub Pages.

Check out baseline_optimal modules available and their respective documentation as well as example.

Modules Task Documentation Example
baseline_optimal.class_task classification Link Link

Check out machine learning algorithms supported and hyperparameters considered.

Algorithm Source Hyperparameters
DecisionTreeClassifier sklearn.tree max_features
max_depth
min_samples_split
RandomForestClassifier sklearn.ensemble n_estimators
max_features
max_depth
min_samples_split
AdaBoostClassifier sklearn.ensemble n_estimators
learning_rate
XGBClassifier xgboost n_estimators
learning_rate
max_depth

Why "Baseline" Optimal

The current version supports feature selection, missing value imputation, scaling and encoding as data transformation and processing steps. The pipeline performance is evaluated based on choices of these components along with multiple machine learning algorithms. With help of Optuna, the package gives you the optimal workflow provided the raw data.

The results are "baseline" optimal because the workflow attempts only the most basic methods. No feature engineering or dimensionality reduction, so on and so forth. It aims to answer the lazy question that, "If I do nothing, how far can I get?" By using this package, if you get satisfting results then congradulations! If not, then you know where the baseline is and you might want to do better than that based on your domain knowledge.

🤞 Good luck. 🤞

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

baseline_optimal-0.0.7.tar.gz (8.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page