Skip to main content

AutoML for Tabular datasets.

Project description

Autotraino :truck:

:warning: Warning, alpha version, everything brakes. :warning:

autotraino is a small wrapper library for AutoML on tabular datasets.

from autotraino.gluon import AutogluonTrainer 
from datasets import load_dataset

train = load_dataset("mstz/adult", "income")["train"].to_pandas() 

# train the model
trainer = AutogluonTrainer()
trainer = trainer.fit(train, target_feature="over_threshold", time_limit=100)

When fitting we can control basic parameters such as where to store the resulting models (parameter save_path of the trainer constructor) or the time budget assigned to the trainer (parameter time_limit, expressed in seconds).

Once trained, we can access the single models

# trained models
print(trainer.names)

print(trainer["LightGBM"])

and predict directly from the Trainer itself:

train_x = train.copy().drop("over_threshold", axis="columns")
predictions = trainer.predict(train_x, with_models=["LightGBM", "RandomForest"])

Quickstart

You can install autotraino via test-pypi:

# optional
mkvirtualenv -p python3.10 autotraino

pip install --extra-index-url https://test.pypi.org/simple/autotraino

Datasets

autotraino is based off pandas.DataFrames. You can find a large collection on a Huggingface repository I'm curating at huggingface.co/mstz. Datasets are sourced from UCI, Kaggle, and OpenML. Most are still to be updated (especially dataset cards).

What model families to train?

Currently based on Autogluon, autotraino currently trains the following models:

  • Boosting models
    • LightGBM
    • CatBoost
  • Bagging models
    • Random Forest
    • ExtraTree Classifier
  • Neural Network
    • FastAI
    • NNTorch
  • Classical AI models
    • k-NN
    • Logistic Regression

Preprocessing

autotraino automatically detects feature types and performs the necessary feature preprocessing per model. To ease the process, consider setting the appropriate dtypes in the input pandas.DataFrame.

In the works

Future developments include:

  • Fitting arbitrary functions (ray tune)
  • Fitting multi-output models.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autotraino-0.1.0.tar.gz (37.9 kB view details)

Uploaded Source

Built Distribution

autotraino-0.1.0-py3-none-any.whl (37.4 kB view details)

Uploaded Python 3

File details

Details for the file autotraino-0.1.0.tar.gz.

File metadata

  • Download URL: autotraino-0.1.0.tar.gz
  • Upload date:
  • Size: 37.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for autotraino-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c40946e813a07ac5f516dfeb848f77b2fb24e283f29de1c2419ee01a9a2008d1
MD5 60daf4dca3a5e21915ee9964c2a0952d
BLAKE2b-256 281074754a18d4262a2f015e77cc6a0d386cbadd6a84a6b9407904a85c6bf5d9

See more details on using hashes here.

File details

Details for the file autotraino-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: autotraino-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 37.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for autotraino-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fabede2444b57cca07879952ac2f1ab6ad06dc0adb35f39de5c80e1b98b6c0eb
MD5 75913970ae2843b257221860a66f9cc4
BLAKE2b-256 00a69368afe89baf720250bd539fade605c7d1b2b9393c2d94a189d5b3774bf9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page