Skip to main content

Production-ready PyTorch implementations of Transformers for tabular data (numerical/categorical, multi-output, multi-label)

Project description

TabTransformer & FTTransformer for Tabular Data

Overview

This repository provides faithful, modern PyTorch implementations of two state-of-the-art transformer architectures for tabular data:

Both models are designed for numerical and categorical tabular data, support batch training/inference, GPU acceleration, robust logging, error handling, and multi-output regression or multi-label classification. The code is modular, readable, and ready for research or production.


Features

  • TabTransformer: Contextual embeddings for categorical features using transformer encoder blocks.
  • FTTransformer: Feature tokenization (categorical + numerical), transformer encoder with GLU, and CLS token for prediction.
  • Multi-output support: Works for regression (multi-target), multi-label, and standard classification.
  • Batch and GPU support: Efficient, scalable, and production-ready.
  • Robust logging: All major operations are logged using loguru.
  • Error handling: All critical code paths are wrapped in try/except blocks for easy debugging.
  • Training scripts: Use train_tabtransformer.py and train_fttransformer.py for easy model training with all parameters configurable from the command line.

Installation

1. From PyPI (recommended for users)

pip install tabtransformer-pytorch  # (replace with actual PyPI name)

2. From Source (for development)

# Clone the repository
$ git clone <your-repo-url>
$ cd TabTransformer

# Install dependencies
$ pip install -r requirements.txt

Usage: Training Scripts

1. Data Format

  • Input: You must provide your training data as three files:
    • Categorical features: CSV or numpy file, shape (num_samples, num_categorical_columns), integer-encoded (0 ... n_classes-1 for each column)
    • Continuous features: CSV or numpy file, shape (num_samples, num_continuous_columns), float32
    • Labels: CSV or numpy file, shape (num_samples,) for single-output, or (num_samples, num_outputs) for multi-output (float32 for regression/multi-label, int64 for classification)

Example:

train_categ.npy      # shape: (N, num_categorical_columns), dtype=int64
train_cont.npy       # shape: (N, num_continuous_columns), dtype=float32
train_labels.npy     # shape: (N,) or (N, num_outputs), dtype=int64 or float32
  • How to save your data:
import numpy as np
np.save('train_categ.npy', x_categ)   # integer-encoded categorical
np.save('train_cont.npy', x_cont)     # float32 continuous
np.save('train_labels.npy', y)        # int64 for classification, float32 for regression/multi-label

2. Training TabTransformer

python examples/train_tabtransformer.py \
    --categ_path train_categ.npy \
    --cont_path train_cont.npy \
    --labels_path train_labels.npy \
    --categories 10 5 6 5 8 \
    --num_continuous 10 \
    --num_classes 2 \
    --dim 32 \
    --depth 6 \
    --heads 8 \
    --attn_dropout 0.1 \
    --ff_dropout 0.1 \
    --mlp_hidden_mults 4 2 \
    --mlp_act relu \
    --epochs 10 \
    --batch_size 64 \
    --lr 1e-3

3. Training FTTransformer

python examples/train_fttransformer.py \
    --categ_path train_categ.npy \
    --cont_path train_cont.npy \
    --labels_path train_labels.npy \
    --categories 10 5 6 5 8 \
    --num_continuous 10 \
    --num_classes 2 \
    --dim 32 \
    --depth 6 \
    --heads 8 \
    --attn_dropout 0.1 \
    --ff_dropout 0.1 \
    --epochs 10 \
    --batch_size 64 \
    --lr 1e-3

All Parameters:

  • --categ_path: Path to categorical features file (npy or csv)
  • --cont_path: Path to continuous features file (npy or csv)
  • --labels_path: Path to labels file (npy or csv)
  • --categories: List of unique values per categorical column (e.g., 10 5 6 5 8)
  • --num_continuous: Number of continuous columns
  • --num_classes: Number of output classes or outputs (for regression/multi-label, set to output dimension)
  • --dim: Embedding dimension (default: 32)
  • --depth: Number of transformer layers (default: 6)
  • --heads: Number of attention heads (default: 8)
  • --attn_dropout: Attention dropout (default: 0.1)
  • --ff_dropout: Feedforward dropout (default: 0.1)
  • --mlp_hidden_mults: Multipliers for MLP hidden layers (default: 4 2, TabTransformer only)
  • --mlp_act: Activation function for MLP (relu or selu, default: relu, TabTransformer only)
  • --epochs: Number of training epochs
  • --batch_size: Batch size
  • --lr: Learning rate

Multi-Output & Multi-Label Support

1. Multi-Output Regression

  • Labels shape: (num_samples, num_outputs) (e.g., (N, 5) for 5 regression targets)
  • Set --num_classes 5 (or your output dimension)
  • Change loss in script to:
    criterion = torch.nn.MSELoss()
    yb = torch.tensor(y[idx:idx+args.batch_size], dtype=torch.float32, device=device)
    
  • Labels dtype: float32

2. Multi-Label Classification

  • Labels shape: (num_samples, num_outputs) (e.g., (N, 5) for 5 binary labels)
  • Set --num_classes 5
  • Change loss in script to:
    criterion = torch.nn.BCEWithLogitsLoss()
    yb = torch.tensor(y[idx:idx+args.batch_size], dtype=torch.float32, device=device)
    
  • Labels dtype: float32 (with values 0 or 1)

3. Multi-Class, Multi-Output (rare)

  • Each output is a separate multi-class problem. Use a custom loss (e.g., sum of CrossEntropyLoss for each output column).

API Documentation

TabTransformer

from tabtransformer.model import TabTransformer

model = TabTransformer(
    categories=(10, 5, 6, 5, 8),   # tuple: unique values per categorical column
    num_continuous=10,             # number of continuous features
    dim=32,                       # embedding dimension
    dim_out=2,                    # output dimension (e.g., num classes or outputs)
    depth=6,                      # number of transformer layers
    heads=8,                      # number of attention heads
    attn_dropout=0.1,              # attention dropout
    ff_dropout=0.1,                # feedforward dropout
    mlp_hidden_mults=(4, 2),      # MLP hidden layer multipliers
    mlp_act=nn.ReLU()             # activation function
)

# Forward pass
out = model(x_categ, x_cont)
# x_categ: (batch, num_categ), torch.LongTensor
# x_cont: (batch, num_cont), torch.FloatTensor

FTTransformer

from fttransformer.model import FTTransformer

model = FTTransformer(
    categories=(10, 5, 6, 5, 8),   # tuple: unique values per categorical column
    num_continuous=10,             # number of continuous features
    dim=32,                       # embedding dimension
    dim_out=2,                    # output dimension (e.g., num classes or outputs)
    depth=6,                       # number of transformer layers
    heads=8,                       # number of attention heads
    attn_dropout=0.1,              # attention dropout
    ff_dropout=0.1                 # feedforward dropout
)

# Forward pass
out = model(x_categ, x_cont)
# x_categ: (batch, num_categ), torch.LongTensor
# x_cont: (batch, num_cont), torch.FloatTensor

Logging & Error Handling

  • All major operations are logged using loguru.
  • Errors are caught and logged with stack traces for easy debugging.
  • You can control log level and output by editing utils/logger.py.

Project Structure

TabTransformer/
├── tabtransformer/         # TabTransformer model
│   └── model.py
├── fttransformer/          # FTTransformer model
│   └── model.py
├── utils/                  # Utilities (logging, batching, device)
│   ├── logger.py
│   ├── device.py
│   └── batch.py
├── examples/               # Training scripts
│   ├── train_tabtransformer.py
│   └── train_fttransformer.py
├── requirements.txt
└── README.md

References



Publishing to PyPI (for maintainers)

  1. Update setup.py and pyproject.toml

    • Set the package name (e.g., tabtransformer-pytorch), version, author, description, etc.
    • Make sure all dependencies are listed.
  2. Build the package

python setup.py sdist bdist_wheel
  1. Upload to PyPI

    • Install twine:
      pip install twine
      
    • Upload:
      twine upload dist/*
      
    • For test uploads, use --repository testpypi.
  2. Users can now install via

pip install tabtransformer-pytorch

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tabular_transformers-0.1.1.tar.gz (12.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tabular_transformers-0.1.1-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file tabular_transformers-0.1.1.tar.gz.

File metadata

  • Download URL: tabular_transformers-0.1.1.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for tabular_transformers-0.1.1.tar.gz
Algorithm Hash digest
SHA256 259340dc6f75f6ae0228d42a9220d2f1b71a9c566baad742cf9a3815e3805020
MD5 fd084c92a796786d5cb73e8995e0ee47
BLAKE2b-256 f224c1dcf58abd28d09e9d1801198ddb488b9d77f4c564c034d815b275814194

See more details on using hashes here.

File details

Details for the file tabular_transformers-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for tabular_transformers-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1b35a433a4371ad82bd9a555079eee7a089249660824bb8171dd32f021f99d92
MD5 fee3a1978e2169428479e353df3f06ee
BLAKE2b-256 5b95c8ed62500fbe881a56e9be6f6d2c6252a57eff0809fb7fb40a77ee435e9c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page