AutoML for Tabular datasets.
Project description
Autotraino :truck:
:warning: Warning, alpha version, everything brakes. :warning:
autotraino is a small wrapper library for AutoML on tabular datasets.
from autotraino.gluon import AutogluonTrainer
from datasets import load_dataset
train = load_dataset("mstz/adult", "income")["train"].to_pandas()
# train the model
trainer = AutogluonTrainer()
trainer = trainer.fit(train, target_feature="over_threshold", time_limit=100)
When fitting we can control basic parameters such as where to store the resulting models
(parameter save_path of the trainer constructor) or the time budget assigned to the trainer (parameter
time_limit, expressed in seconds).
Once trained, we can access the single models
# trained models
print(trainer.names)
print(trainer["LightGBM"])
and predict directly from the Trainer itself:
train_x = train.copy().drop("over_threshold", axis="columns")
predictions = trainer.predict(train_x, with_models=["LightGBM", "RandomForest"])
Quickstart
You can install autotraino via test-pypi:
# optional
mkvirtualenv -p python3.10 autotraino
pip install --extra-index-url https://test.pypi.org/simple/autotraino
Datasets
autotraino is based off pandas.DataFrames.
You can find a large collection on a Huggingface repository I'm curating at huggingface.co/mstz.
Datasets are sourced from UCI, Kaggle, and OpenML.
Most are still to be updated (especially dataset cards).
What model families to train?
Currently based on Autogluon, autotraino currently trains the following models:
- Boosting models
- LightGBM
- CatBoost
- Bagging models
- Random Forest
- ExtraTree Classifier
- Neural Network
- FastAI
- NNTorch
- Classical AI models
- k-NN
- Logistic Regression
Preprocessing
autotraino automatically detects feature types and performs the necessary feature preprocessing per model.
To ease the process, consider setting the appropriate dtypes in the input pandas.DataFrame.
In the works
Future developments include:
- Fitting arbitrary functions (ray tune)
- Fitting multi-output models.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autotraino-0.1.0.tar.gz.
File metadata
- Download URL: autotraino-0.1.0.tar.gz
- Upload date:
- Size: 37.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c40946e813a07ac5f516dfeb848f77b2fb24e283f29de1c2419ee01a9a2008d1
|
|
| MD5 |
60daf4dca3a5e21915ee9964c2a0952d
|
|
| BLAKE2b-256 |
281074754a18d4262a2f015e77cc6a0d386cbadd6a84a6b9407904a85c6bf5d9
|
File details
Details for the file autotraino-0.1.0-py3-none-any.whl.
File metadata
- Download URL: autotraino-0.1.0-py3-none-any.whl
- Upload date:
- Size: 37.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fabede2444b57cca07879952ac2f1ab6ad06dc0adb35f39de5c80e1b98b6c0eb
|
|
| MD5 |
75913970ae2843b257221860a66f9cc4
|
|
| BLAKE2b-256 |
00a69368afe89baf720250bd539fade605c7d1b2b9393c2d94a189d5b3774bf9
|