No project description provided
Project description
Introduction to Baseline Optimal
Between the raw data and the optimal results in machine learning projects, there is an exhausting, iterative process. We go back and forth to experiment with various combinations of feature engineering, processing methods, and models along with their hyperparameters. At the end of the day, we hope our efforts pay off.
🤞 God bless data scientists. 🤞
Manual experimentation is a good practice. However, you may have found that we often produce messy, repetitive code throughout the process, and it takes a long while for us to figure out that an attempt doesn't work out. Sometimes we may overcomplicate data transformation and processing to get promising but unncessary metric scores.
Given these problems, the baseline_optimal
package automates the workflow by employing Optuna's Bayesian optimization, significantly reducing the need for manual experimentation. You provide the raw data, and the modules do the heavy lifting.
Installation
You can install the baseline_optimal
package and its dependencies using pip
:
pip install baseline_optimal
After installation, you can import the package in Python:
import baseline_optimal
Documentation
Access the the entire documentation through GitHub Pages.
Check out baseline_optimal
modules available and their respective documentation as well as example.
Check out machine learning algorithms supported and hyperparameters considered.
Algorithm | Source | Hyperparameters |
---|---|---|
DecisionTreeClassifier |
sklearn.tree |
max_features max_depth min_samples_split |
RandomForestClassifier |
sklearn.ensemble |
n_estimators max_features max_depth min_samples_split |
AdaBoostClassifier |
sklearn.ensemble |
n_estimators learning_rate |
XGBClassifier |
xgboost |
n_estimators learning_rate max_depth |
Why "Baseline" Optimal
The current version supports feature selection, missing value imputation, scaling and encoding as data transformation and processing steps. The pipeline performance is evaluated based on choices of these components along with multiple machine learning algorithms. With help of Optuna, the package gives you the optimal workflow provided the raw data.
The results are "baseline" optimal because the workflow attempts only the most basic methods. No feature engineering or dimensionality reduction, so on and so forth. It aims to answer the lazy question that, "If I do nothing, how far can I get?" By using this package, if you get satisfting results then congradulations! If not, then you know where the baseline is and you might want to do better than that based on your domain knowledge.
🤞 Good luck. 🤞
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file baseline_optimal-0.0.7.tar.gz
.
File metadata
- Download URL: baseline_optimal-0.0.7.tar.gz
- Upload date:
- Size: 8.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 047bb823b77551e162c0076a8e3681e8bdb621648e5584dfec6186d65c792cc0 |
|
MD5 | 5bff00a1daf6eb44cfb0214a4f1f1563 |
|
BLAKE2b-256 | 5870026a5d69115b77603f06cb6d3e711f4f8bd3fc92099565a691a17517535b |