Skip to main content

A standard framework for using Deep Learning for tabular data

Project description

PyTorch Tabular

PyTorch Tabular provides a unified interface to deep learning architectures for tabular data. It provides a high-level API and uses PyTorch Lightning to scale training on GPU or CPU, with automatic logging.

Documentation · Tutorials · Release Notes
Open Source MIT GC.OS Sponsored contributions welcome
Tutorials Open In Colab
Community !discord !slack
CI/CD github-actions readthedocs
Code !pypi !conda !python-versions !black
Downloads PyPI - Downloads PyPI - Downloads Downloads
Citation DOI

PyTorch Tabular aims to make Deep Learning with Tabular data easy and accessible to real-world cases and research alike. The core principles behind the design of the library are:

  • Low Resistance Usability
  • Easy Customization
  • Scalable and Easier to Deploy

It has been built on the shoulders of giants like PyTorch(obviously), and PyTorch Lightning.

Table of Contents

Installation

Although the installation includes PyTorch, the best and recommended way is to first install PyTorch from here, picking up the right CUDA version for your machine.

Once, you have got Pytorch installed, just use:

pip install -U “pytorch_tabular[extra]

to install the complete library with extra dependencies (Weights&Biases & Plotly).

And :

pip install -U “pytorch_tabular”

for the bare essentials.

The sources for pytorch_tabular can be downloaded from the Github repo_.

You can either clone the public repository:

git clone git://github.com/manujosephv/pytorch_tabular

Once you have a copy of the source, you can install it with:

cd pytorch_tabular && pip install .[extra]

Documentation

For complete Documentation with tutorials visit ReadTheDocs

Available Models

  • FeedForward Network with Category Embedding is a simple FF network, but with an Embedding layers for the categorical columns.
  • Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data is a model presented in ICLR 2020 and according to the authors have beaten well-tuned Gradient Boosting models on many datasets.
  • TabNet: Attentive Interpretable Tabular Learning is another model coming out of Google Research which uses Sparse Attention in multiple steps of decision making to model the output.
  • Mixture Density Networks is a regression model which uses gaussian components to approximate the target function and provide a probabilistic prediction out of the box.
  • AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks is a model which tries to learn interactions between the features in an automated way and create a better representation and then use this representation in downstream task
  • TabTransformer is an adaptation of the Transformer model for Tabular Data which creates contextual representations for categorical features.
  • FT Transformer from Revisiting Deep Learning Models for Tabular Data
  • Gated Additive Tree Ensemble is a novel high-performance, parameter and computationally efficient deep learning architecture for tabular data. GATE uses a gating mechanism, inspired from GRU, as a feature representation learning unit with an in-built feature selection mechanism. We combine it with an ensemble of differentiable, non-linear decision trees, re-weighted with simple self-attention to predict our desired output.
  • Gated Adaptive Network for Deep Automated Learning of Features (GANDALF) is pared-down version of GATE which is more efficient and performing than GATE. GANDALF makes GFLUs the main learning unit, also introducing some speed-ups in the process. With very minimal hyperparameters to tune, this becomes an easy to use and tune model.
  • DANETs: Deep Abstract Networks for Tabular Data Classification and Regression is a novel and flexible neural component for tabular data, called Abstract Layer (AbstLay), which learns to explicitly group correlative input features and generate higher-level features for semantics abstraction. A special basic block is built using AbstLays, and we construct a family of Deep Abstract Networks (DANets) for tabular data classification and regression by stacking such blocks.

Semi-Supervised Learning

  • Denoising AutoEncoder is an autoencoder which learns robust feature representation, to compensate any noise in the dataset.

Implement Custom Models

To implement new models, see the How to implement new models tutorial. It covers basic as well as advanced architectures.

Usage

from pytorch_tabular import TabularModel
from pytorch_tabular.models import CategoryEmbeddingModelConfig
from pytorch_tabular.config import (
    DataConfig,
    OptimizerConfig,
    TrainerConfig,
    ExperimentConfig,
)

data_config = DataConfig(
    target=[
        "target"
    ],  # target should always be a list.
    continuous_cols=num_col_names,
    categorical_cols=cat_col_names,
)
trainer_config = TrainerConfig(
    auto_lr_find=True,  # Runs the LRFinder to automatically derive a learning rate
    batch_size=1024,
    max_epochs=100,
)
optimizer_config = OptimizerConfig()

model_config = CategoryEmbeddingModelConfig(
    task="classification",
    layers="1024-512-512",  # Number of nodes in each layer
    activation="LeakyReLU",  # Activation between each layers
    learning_rate=1e-3,
)

tabular_model = TabularModel(
    data_config=data_config,
    model_config=model_config,
    optimizer_config=optimizer_config,
    trainer_config=trainer_config,
)
tabular_model.fit(train=train, validation=val)
result = tabular_model.evaluate(test)
pred_df = tabular_model.predict(test)
tabular_model.save_model("examples/basic")
loaded_model = TabularModel.load_model("examples/basic")

Blogs

Future Roadmap(Contributions are Welcome)

  1. Integrate Optuna Hyperparameter Tuning
  2. Migrate Datamodule to Polars or NVTabular for faster data loading and to handle larger than RAM datasets.
  3. Add GaussRank as Feature Transformation
  4. Have a scikit-learn compatible API
  5. Enable support for multi-label classification
  6. Keep adding more architectures

Contributors

manujosephv
Manu Joseph
Borda
Jirka Borovec
wsad1
Jinu Sunil
ProgramadorArtificial
Programador Artificial
sorenmacbeth
Soren Macbeth
ArozHada
Aroj Hada
fonnesbeck
Chris Fonnesbeck
snehilchatterjee
Snehil Chatterjee
jxtrbtk
Null
abhisharsinha
Abhishar Sinha
ndrsfel
Andreas
charitarthchugh
Charitarth Chugh
EeyoreLee
Earlee
JulianRein
Null
krshrimali
Kushashwa Ravi Shrimali
Actis92
Luca Actis Grosso
sgbaird
Sterling G. Baird
furyhawk
Teck Meng
yinyunie
Yinyu Nie
YonyBresler
YonyBresler
HernandoR
Liu Zhen
enifeder
enifeder
taimo3810
taimo

Citation

If you use PyTorch Tabular for a scientific publication, we would appreciate citations to the published software and the following paper:

@misc{joseph2021pytorch,
      title={PyTorch Tabular: A Framework for Deep Learning with Tabular Data},
      author={Manu Joseph},
      year={2021},
      eprint={2104.13638},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
  • Zenodo Software Citation
@software{manu_joseph_2023_7554473,
  author       = {Manu Joseph and
                  Jinu Sunil and
                  Jiri Borovec and
                  Chris Fonnesbeck and
                  jxtrbtk and
                  Andreas and
                  JulianRein and
                  Kushashwa Ravi Shrimali and
                  Luca Actis Grosso and
                  Sterling G. Baird and
                  Yinyu Nie},
  title        = {manujosephv/pytorch\_tabular: v1.0.1},
  month        = jan,
  year         = 2023,
  publisher    = {Zenodo},
  version      = {v1.0.1},
  doi          = {10.5281/zenodo.7554473},
  url          = {https://doi.org/10.5281/zenodo.7554473}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytorch_tabular-1.2.0.tar.gz (2.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pytorch_tabular-1.2.0-py3-none-any.whl (165.8 kB view details)

Uploaded Python 3

File details

Details for the file pytorch_tabular-1.2.0.tar.gz.

File metadata

  • Download URL: pytorch_tabular-1.2.0.tar.gz
  • Upload date:
  • Size: 2.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pytorch_tabular-1.2.0.tar.gz
Algorithm Hash digest
SHA256 1b96b576eb3de443840b313d0b298293eaf83dcfdbba53ed8974b76d1351b821
MD5 c0a9da1016842409ea171e5c94dc49de
BLAKE2b-256 0bf2823de16d6a461504f4ed8e4a555d6ce356e5f81e6525d95e2b64895ec94f

See more details on using hashes here.

File details

Details for the file pytorch_tabular-1.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pytorch_tabular-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0a59f8a2304856b3d1e905f7b66153ebc65df1a6a017f2c8a13a29f62dc95b26
MD5 62bdd06819884169cdf08f2510ae3980
BLAKE2b-256 6fc91e01c682e2ad7132bc1943d8d367c96f241bf85679e76d66eb0c4e4cbde9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page