A framework to ensemble model bases and evaluate various models for tabular predictions.

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

Project description

tabular_ensemble

A framework to evaluate various models for tabular regression and classification tasks. The package integrates 25 machine learning (including deep learning) models for tabular prediction tasks from the following well-established model bases:

autogluon
- "LightGBM", "CatBoost", "XGBoost", "Random Forest", "Extremely Randomized Trees", "K-Nearest Neighbors", "Linear Regression", "Neural Network with MXNet", "Neural Network with PyTorch", "Neural Network with FastAI".
pytorch_widedeep
- "TabMlp", "TabResnet", "TabTransformer", "TabNet", "SAINT", "ContextAttentionMLP", "SelfAttentionMLP", "FTTransformer", "TabPerceiver", "TabFastFormer".
pytorch_tabular
- "Category Embedding", "NODE", "TabNet", "TabTransformer", "AutoInt", "FTTransformer".

You are able to implement your own models, data processing pipelines, and datasets under the flexible and well-tested framework for consistent comparisons with baseline models, which is even easier when your own model is based on pytorch.

Supported features for all model bases:

Data processing
- Data splitting (training/validation/testing sets)
- Data imputation
- Data filtering
- Data scaling
- Data augmentation
- Feature augmentation
- Feature selection
- etc.
Multi-modal data
Loading UCI datasets
Data/result analysis
- Leaderboard
- Box plot
- Pair plot
- Pearson correlation
- Partial dependency plot (with bootstrapping)
- Feature importance (Permutation and SHAP)
- etc.
Building models upon other trained models
pytorch_lightning-based training for pytorch models
Gaussian-process-based Bayesian hyperparameter optimization
Cross-validation (including continuing from a cross-validation checkpoint)
Saving, loading, and migrating models

The package stands on the shoulder of the giants:

scikit-learn
PyTorch
PyTorch Lightning
etc. (See requirements.txt)

Installation/Usage

A full documentation is available here. For a quick start:

tabular_ensemble can be installed using pypi by running the following command:

pip install tabensemb[torch]

Please use pip install tabensemb instead if you already have torch>=1.12.0 installed. Use pip install tabensemb[test] if you want to run unit tests.

(Optional) Run unit tests after installed tabensemb[test]:

cd test
pytest .

Place your .csv or .xlsx file in a data subfolder (e.g., data/sample.csv), and generate a configuration file in a configs subfolder (e.g., configs/sample.py), containing the following content

cfg = {
    "database": "sample",
    "continuous_feature_names": ["cont_0", "cont_1", "cont_2", "cont_3", "cont_4"],
    "categorical_feature_names": ["cat_0", "cat_1", "cat_2"],
    "label_name": ["target"],
}

Run the experiment using the configuration and the data using

python main.py --base sample --epoch 10

where --base refers to the configuration file, and additional arguments (such as --epoch here) refer to those in config/default.py.

See the documentation pages for details.

Citation

If you use this repository, please cite us as:

(Will be updated after released on arXiv or published)

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

xuelingluo

Release history Release notifications | RSS feed

This version

0.2

Sep 25, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tabensemb-0.2.tar.gz (152.3 kB view hashes)

Uploaded Sep 25, 2024 Source

Built Distribution

tabensemb-0.2-py3-none-any.whl (142.6 kB view hashes)

Uploaded Sep 25, 2024 Python 3

Hashes for tabensemb-0.2.tar.gz

Hashes for tabensemb-0.2.tar.gz
Algorithm	Hash digest
SHA256	`cb3e8ef978d7e85a52b2e2648b0a5a038c845aba8d81b1980c19aa51f22c8aa1`
MD5	`1ea705ee476a09f8bde6e169b47a625f`
BLAKE2b-256	`99e545ae22c8261ebbdecd131fca59ba339e23d09ad861d9d2481c6109ddca1a`

Hashes for tabensemb-0.2-py3-none-any.whl

Hashes for tabensemb-0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f22d92f5836f1f0c6398f915175d7c3c96ba40c07bce65066455865784cc284d`
MD5	`8f523f0e34cd58ed3ffee97ef0d1c789`
BLAKE2b-256	`69ba3fa3bb97e99f0135ab7b7703e29fcdede1f44a4dee028eb591a62109046d`