Skip to main content

A framework to ensemble model bases and evaluate various models for tabular predictions.

Project description

tabular_ensemble

Code style: black codecov Test Documentation Status

A framework to evaluate various models for tabular regression and classification tasks. The package integrates 25 machine learning (including deep learning) models for tabular prediction tasks from the following well-established model bases:

  • autogluon
    • "LightGBM", "CatBoost", "XGBoost", "Random Forest", "Extremely Randomized Trees", "K-Nearest Neighbors", "Linear Regression", "Neural Network with MXNet", "Neural Network with PyTorch", "Neural Network with FastAI".
  • pytorch_widedeep
    • "TabMlp", "TabResnet", "TabTransformer", "TabNet", "SAINT", "ContextAttentionMLP", "SelfAttentionMLP", "FTTransformer", "TabPerceiver", "TabFastFormer".
  • pytorch_tabular
    • "Category Embedding", "NODE", "TabNet", "TabTransformer", "AutoInt", "FTTransformer".

You are able to implement your own models, data processing pipelines, and datasets under the flexible and well-tested framework for consistent comparisons with baseline models, which is even easier when your own model is based on pytorch.

image

Supported features for all model bases:

  • Data processing
    • Data splitting (training/validation/testing sets)
    • Data imputation
    • Data filtering
    • Data scaling
    • Data augmentation
    • Feature augmentation
    • Feature selection
    • etc.
  • Multi-modal data
  • Loading UCI datasets
  • Data/result analysis
    • Leaderboard
    • Box plot
    • Pair plot
    • Pearson correlation
    • Partial dependency plot (with bootstrapping)
    • Feature importance (Permutation and SHAP)
    • etc.
  • Building models upon other trained models
  • pytorch_lightning-based training for pytorch models
  • Gaussian-process-based Bayesian hyperparameter optimization
  • Cross-validation (including continuing from a cross-validation checkpoint)
  • Saving, loading, and migrating models

The package stands on the shoulder of the giants:

Installation/Usage

A full documentation is available here. For a quick start:

  1. tabular_ensemble can be installed using pypi by running the following command:
pip install tabensemb[torch]

Please use pip install tabensemb instead if you already have torch>=1.12.0 installed. Use pip install tabensemb[test] if you want to run unit tests.

To install from source,

pip install -e .[torch]
  1. (Optional) Run unit tests after installed tabensemb[test]:
cd test
pytest .
  1. Place your .csv or .xlsx file in a data subfolder (e.g., data/sample.csv), and generate a configuration file in a configs subfolder (e.g., configs/sample.py), containing the following content
cfg = {
    "database": "sample",
    "continuous_feature_names": ["cont_0", "cont_1", "cont_2", "cont_3", "cont_4"],
    "categorical_feature_names": ["cat_0", "cat_1", "cat_2"],
    "label_name": ["target"],
}
  1. Run the experiment using the configuration and the data using
python main.py --base sample --epoch 10

where --base refers to the configuration file, and additional arguments (such as --epoch here) refer to those in config/default.py.

See the documentation pages for details.

Citation

If you use this repository, please cite us as:

(Will be updated after released on arXiv or published)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tabensemb-0.3.tar.gz (152.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tabensemb-0.3-py3-none-any.whl (142.8 kB view details)

Uploaded Python 3

File details

Details for the file tabensemb-0.3.tar.gz.

File metadata

  • Download URL: tabensemb-0.3.tar.gz
  • Upload date:
  • Size: 152.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for tabensemb-0.3.tar.gz
Algorithm Hash digest
SHA256 43451b7dd05814cd2e66a9c096cb4fcfd5cad4d4dd0641507fb5482c739b4a71
MD5 eb0ec7e1ee31401af51a840752502ad7
BLAKE2b-256 523795a68e69f1ee69efd354212b90e8d685f91648cac2f4aa8548afedc72b96

See more details on using hashes here.

Provenance

The following attestation bundles were made for tabensemb-0.3.tar.gz:

Publisher: publish.yml on Luwen-Zhang/tabular_ensemble

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tabensemb-0.3-py3-none-any.whl.

File metadata

  • Download URL: tabensemb-0.3-py3-none-any.whl
  • Upload date:
  • Size: 142.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for tabensemb-0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 1b14d2518284028fce70f20b95f7e8526af540951df65946e763314d5e5479a2
MD5 527e2453eab137c6fd5db4414b39faf4
BLAKE2b-256 cc4a3281efae6c4837da2549c4f7fbbd5039e31cba478fe14dd0a5ce621ef711

See more details on using hashes here.

Provenance

The following attestation bundles were made for tabensemb-0.3-py3-none-any.whl:

Publisher: publish.yml on Luwen-Zhang/tabular_ensemble

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page