A framework to ensemble model bases and evaluate various models for tabular predictions.
Project description
tabular_ensemble
A framework to evaluate various models for tabular regression and classification tasks. The package integrates 25 machine learning (including deep learning) models for tabular prediction tasks from the following well-established model bases:
autogluon"LightGBM","CatBoost","XGBoost","Random Forest","Extremely Randomized Trees","K-Nearest Neighbors","Linear Regression","Neural Network with MXNet","Neural Network with PyTorch","Neural Network with FastAI".
pytorch_widedeep"TabMlp","TabResnet","TabTransformer","TabNet","SAINT","ContextAttentionMLP","SelfAttentionMLP","FTTransformer","TabPerceiver","TabFastFormer".
pytorch_tabular"Category Embedding","NODE","TabNet","TabTransformer","AutoInt","FTTransformer".
You are able to implement your own models, data processing pipelines, and datasets under the flexible and
well-tested framework for consistent comparisons with baseline models, which is even easier when your own model is
based on pytorch.
Supported features for all model bases:
- Data processing
- Data splitting (training/validation/testing sets)
- Data imputation
- Data filtering
- Data scaling
- Data augmentation
- Feature augmentation
- Feature selection
- etc.
- Multi-modal data
- Loading UCI datasets
- Data/result analysis
- Leaderboard
- Box plot
- Pair plot
- Pearson correlation
- Partial dependency plot (with bootstrapping)
- Feature importance (Permutation and SHAP)
- etc.
- Building models upon other trained models
pytorch_lightning-based training forpytorchmodels- Gaussian-process-based Bayesian hyperparameter optimization
- Cross-validation (including continuing from a cross-validation checkpoint)
- Saving, loading, and migrating models
The package stands on the shoulder of the giants:
- scikit-learn
- PyTorch
- PyTorch Lightning
- etc. (See
requirements.txt)
Installation/Usage
A full documentation is available here. For a quick start:
tabular_ensemblecan be installed using pypi by running the following command:
pip install tabensemb[torch]
Please use pip install tabensemb instead if you already have torch>=1.12.0 installed. Use pip install tabensemb[test] if you want to run unit tests.
To install from source,
pip install -e .[torch]
- (Optional) Run unit tests after installed
tabensemb[test]:
cd test
pytest .
- Place your
.csvor.xlsxfile in adatasubfolder (e.g.,data/sample.csv), and generate a configuration file in aconfigssubfolder (e.g.,configs/sample.py), containing the following content
cfg = {
"database": "sample",
"continuous_feature_names": ["cont_0", "cont_1", "cont_2", "cont_3", "cont_4"],
"categorical_feature_names": ["cat_0", "cat_1", "cat_2"],
"label_name": ["target"],
}
- Run the experiment using the configuration and the data using
python main.py --base sample --epoch 10
where --base refers to the configuration file, and additional arguments (such as --epoch here) refer to those in config/default.py.
See the documentation pages for details.
Citation
If you use this repository, please cite us as:
(Will be updated after released on arXiv or published)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tabensemb-0.3.tar.gz.
File metadata
- Download URL: tabensemb-0.3.tar.gz
- Upload date:
- Size: 152.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
43451b7dd05814cd2e66a9c096cb4fcfd5cad4d4dd0641507fb5482c739b4a71
|
|
| MD5 |
eb0ec7e1ee31401af51a840752502ad7
|
|
| BLAKE2b-256 |
523795a68e69f1ee69efd354212b90e8d685f91648cac2f4aa8548afedc72b96
|
Provenance
The following attestation bundles were made for tabensemb-0.3.tar.gz:
Publisher:
publish.yml on Luwen-Zhang/tabular_ensemble
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tabensemb-0.3.tar.gz -
Subject digest:
43451b7dd05814cd2e66a9c096cb4fcfd5cad4d4dd0641507fb5482c739b4a71 - Sigstore transparency entry: 164306569
- Sigstore integration time:
-
Permalink:
Luwen-Zhang/tabular_ensemble@4a40e0ec98b75dd1d6579ce90faa21a77417148e -
Branch / Tag:
refs/tags/v0.3 - Owner: https://github.com/Luwen-Zhang
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4a40e0ec98b75dd1d6579ce90faa21a77417148e -
Trigger Event:
release
-
Statement type:
File details
Details for the file tabensemb-0.3-py3-none-any.whl.
File metadata
- Download URL: tabensemb-0.3-py3-none-any.whl
- Upload date:
- Size: 142.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b14d2518284028fce70f20b95f7e8526af540951df65946e763314d5e5479a2
|
|
| MD5 |
527e2453eab137c6fd5db4414b39faf4
|
|
| BLAKE2b-256 |
cc4a3281efae6c4837da2549c4f7fbbd5039e31cba478fe14dd0a5ce621ef711
|
Provenance
The following attestation bundles were made for tabensemb-0.3-py3-none-any.whl:
Publisher:
publish.yml on Luwen-Zhang/tabular_ensemble
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tabensemb-0.3-py3-none-any.whl -
Subject digest:
1b14d2518284028fce70f20b95f7e8526af540951df65946e763314d5e5479a2 - Sigstore transparency entry: 164306571
- Sigstore integration time:
-
Permalink:
Luwen-Zhang/tabular_ensemble@4a40e0ec98b75dd1d6579ce90faa21a77417148e -
Branch / Tag:
refs/tags/v0.3 - Owner: https://github.com/Luwen-Zhang
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4a40e0ec98b75dd1d6579ce90faa21a77417148e -
Trigger Event:
release
-
Statement type: