Skip to main content

A framework to ensemble model bases and evaluate various models for tabular predictions.

Project description

tabular_ensemble

Code style: black codecov Test Documentation Status

A framework to evaluate various models for tabular regression and classification tasks. The package integrates 25 machine learning (including deep learning) models for tabular prediction tasks from the following well-established model bases:

  • autogluon
    • "LightGBM", "CatBoost", "XGBoost", "Random Forest", "Extremely Randomized Trees", "K-Nearest Neighbors", "Linear Regression", "Neural Network with MXNet", "Neural Network with PyTorch", "Neural Network with FastAI".
  • pytorch_widedeep
    • "TabMlp", "TabResnet", "TabTransformer", "TabNet", "SAINT", "ContextAttentionMLP", "SelfAttentionMLP", "FTTransformer", "TabPerceiver", "TabFastFormer".
  • pytorch_tabular
    • "Category Embedding", "NODE", "TabNet", "TabTransformer", "AutoInt", "FTTransformer".

You are able to implement your own models, data processing pipelines, and datasets under the flexible and well-tested framework for consistent comparisons with baseline models, which is even easier when your own model is based on pytorch.

image

Supported features for all model bases:

  • Data processing
    • Data splitting (training/validation/testing sets)
    • Data imputation
    • Data filtering
    • Data scaling
    • Data augmentation
    • Feature augmentation
    • Feature selection
    • etc.
  • Multi-modal data
  • Loading UCI datasets
  • Data/result analysis
    • Leaderboard
    • Box plot
    • Pair plot
    • Pearson correlation
    • Partial dependency plot (with bootstrapping)
    • Feature importance (Permutation and SHAP)
    • etc.
  • Building models upon other trained models
  • pytorch_lightning-based training for pytorch models
  • Gaussian-process-based Bayesian hyperparameter optimization
  • Cross-validation (including continuing from a cross-validation checkpoint)
  • Saving, loading, and migrating models

The package stands on the shoulder of the giants:

Installation/Usage

A full documentation is available here. For a quick start:

  1. tabular_ensemble can be installed using pypi by running the following command:
pip install tabensemb[torch]

Please use pip install tabensemb instead if you already have torch>=1.12.0 installed. Use pip install tabensemb[test] if you want to run unit tests.

  1. (Optional) Run unit tests after installed tabensemb[test]:
cd test
pytest .
  1. Place your .csv or .xlsx file in a data subfolder (e.g., data/sample.csv), and generate a configuration file in a configs subfolder (e.g., configs/sample.py), containing the following content
cfg = {
    "database": "sample",
    "continuous_feature_names": ["cont_0", "cont_1", "cont_2", "cont_3", "cont_4"],
    "categorical_feature_names": ["cat_0", "cat_1", "cat_2"],
    "label_name": ["target"],
}
  1. Run the experiment using the configuration and the data using
python main.py --base sample --epoch 10

where --base refers to the configuration file, and additional arguments (such as --epoch here) refer to those in config/default.py.

See the documentation pages for details.

Citation

If you use this repository, please cite us as:

(Will be updated after released on arXiv or published)

Project details


Release history Release notifications | RSS feed

This version

0.2

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tabensemb-0.2.tar.gz (152.3 kB view hashes)

Uploaded Source

Built Distribution

tabensemb-0.2-py3-none-any.whl (142.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page