AutoML for Tabular datasets.
Project description
Autotraino :truck:
:warning: Warning, alpha version, everything brakes. :warning:
autotraino
is a small wrapper library for AutoML on tabular datasets.
from autotraino.gluon import AutogluonTrainer
from datasets import load_dataset
train = load_dataset("mstz/adult", "income")["train"].to_pandas()
# train the model
trainer = AutogluonTrainer()
trainer = trainer.fit(train, target_feature="over_threshold", time_limit=100)
When fitting we can control basic parameters such as where to store the resulting models
(parameter save_path
of the trainer constructor) or the time budget assigned to the trainer (parameter
time_limit
, expressed in seconds).
Once trained, we can access the single models
# trained models
print(trainer.names)
print(trainer["LightGBM"])
and predict directly from the Trainer
itself:
train_x = train.copy().drop("over_threshold", axis="columns")
predictions = trainer.predict(train_x, with_models=["LightGBM", "RandomForest"])
Quickstart
You can install autotraino
via test-pypi:
# optional
mkvirtualenv -p python3.10 autotraino
pip install --extra-index-url https://test.pypi.org/simple/autotraino
Datasets
autotraino
is based off pandas.DataFrame
s.
You can find a large collection on a Huggingface repository I'm curating at huggingface.co/mstz.
Datasets are sourced from UCI, Kaggle, and OpenML.
Most are still to be updated (especially dataset cards).
What model families to train?
Currently based on Autogluon, autotraino
currently trains the following models:
- Boosting models
- LightGBM
- CatBoost
- Bagging models
- Random Forest
- ExtraTree Classifier
- Neural Network
- FastAI
- NNTorch
- Classical AI models
- k-NN
- Logistic Regression
Preprocessing
autotraino
automatically detects feature types and performs the necessary feature preprocessing per model.
To ease the process, consider setting the appropriate dtypes
in the input pandas.DataFrame
.
In the works
Future developments include:
- Fitting arbitrary functions (ray tune)
- Fitting multi-output models.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file autotraino-0.1.0.tar.gz
.
File metadata
- Download URL: autotraino-0.1.0.tar.gz
- Upload date:
- Size: 37.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c40946e813a07ac5f516dfeb848f77b2fb24e283f29de1c2419ee01a9a2008d1 |
|
MD5 | 60daf4dca3a5e21915ee9964c2a0952d |
|
BLAKE2b-256 | 281074754a18d4262a2f015e77cc6a0d386cbadd6a84a6b9407904a85c6bf5d9 |
File details
Details for the file autotraino-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: autotraino-0.1.0-py3-none-any.whl
- Upload date:
- Size: 37.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fabede2444b57cca07879952ac2f1ab6ad06dc0adb35f39de5c80e1b98b6c0eb |
|
MD5 | 75913970ae2843b257221860a66f9cc4 |
|
BLAKE2b-256 | 00a69368afe89baf720250bd539fade605c7d1b2b9393c2d94a189d5b3774bf9 |