Skip to main content

Production-ready PyTorch implementations of Transformers for tabular data (numerical/categorical, multi-output, multi-label)

Reason this release was yanked:

It has some errors and bugs

Project description

TabTransformer & FTTransformer for Tabular Data

Overview

This repository provides faithful, modern PyTorch implementations of two state-of-the-art transformer architectures for tabular data:

Both models are designed for numerical and categorical tabular data, support batch training/inference, GPU acceleration, robust logging, error handling, and multi-output regression or multi-label classification. The code is modular, readable, and ready for research or production.


Features

  • TabTransformer: Contextual embeddings for categorical features using transformer encoder blocks.
  • FTTransformer: Feature tokenization (categorical + numerical), transformer encoder with GLU, and CLS token for prediction.
  • Multi-output support: Works for regression (multi-target), multi-label, and standard classification.
  • Batch and GPU support: Efficient, scalable, and production-ready.
  • Robust logging: All major operations are logged using loguru.
  • Error handling: All critical code paths are wrapped in try/except blocks for easy debugging.
  • Training scripts: Use train_tabtransformer.py and train_fttransformer.py for easy model training with all parameters configurable from the command line.

Installation

1. From PyPI (recommended for users)

pip install tabtransformer-pytorch  # (replace with actual PyPI name)

2. From Source (for development)

# Clone the repository
$ git clone <your-repo-url>
$ cd TabTransformer

# Install dependencies
$ pip install -r requirements.txt

Usage: Training Scripts

1. Data Format

  • Input: You must provide your training data as three files:
    • Categorical features: CSV or numpy file, shape (num_samples, num_categorical_columns), integer-encoded (0 ... n_classes-1 for each column)
    • Continuous features: CSV or numpy file, shape (num_samples, num_continuous_columns), float32
    • Labels: CSV or numpy file, shape (num_samples,) for single-output, or (num_samples, num_outputs) for multi-output (float32 for regression/multi-label, int64 for classification)

Example:

train_categ.npy      # shape: (N, num_categorical_columns), dtype=int64
train_cont.npy       # shape: (N, num_continuous_columns), dtype=float32
train_labels.npy     # shape: (N,) or (N, num_outputs), dtype=int64 or float32
  • How to save your data:
import numpy as np
np.save('train_categ.npy', x_categ)   # integer-encoded categorical
np.save('train_cont.npy', x_cont)     # float32 continuous
np.save('train_labels.npy', y)        # int64 for classification, float32 for regression/multi-label

2. Training TabTransformer

python examples/train_tabtransformer.py \
    --categ_path train_categ.npy \
    --cont_path train_cont.npy \
    --labels_path train_labels.npy \
    --categories 10 5 6 5 8 \
    --num_continuous 10 \
    --num_classes 2 \
    --dim 32 \
    --depth 6 \
    --heads 8 \
    --attn_dropout 0.1 \
    --ff_dropout 0.1 \
    --mlp_hidden_mults 4 2 \
    --mlp_act relu \
    --epochs 10 \
    --batch_size 64 \
    --lr 1e-3

3. Training FTTransformer

python examples/train_fttransformer.py \
    --categ_path train_categ.npy \
    --cont_path train_cont.npy \
    --labels_path train_labels.npy \
    --categories 10 5 6 5 8 \
    --num_continuous 10 \
    --num_classes 2 \
    --dim 32 \
    --depth 6 \
    --heads 8 \
    --attn_dropout 0.1 \
    --ff_dropout 0.1 \
    --epochs 10 \
    --batch_size 64 \
    --lr 1e-3

All Parameters:

  • --categ_path: Path to categorical features file (npy or csv)
  • --cont_path: Path to continuous features file (npy or csv)
  • --labels_path: Path to labels file (npy or csv)
  • --categories: List of unique values per categorical column (e.g., 10 5 6 5 8)
  • --num_continuous: Number of continuous columns
  • --num_classes: Number of output classes or outputs (for regression/multi-label, set to output dimension)
  • --dim: Embedding dimension (default: 32)
  • --depth: Number of transformer layers (default: 6)
  • --heads: Number of attention heads (default: 8)
  • --attn_dropout: Attention dropout (default: 0.1)
  • --ff_dropout: Feedforward dropout (default: 0.1)
  • --mlp_hidden_mults: Multipliers for MLP hidden layers (default: 4 2, TabTransformer only)
  • --mlp_act: Activation function for MLP (relu or selu, default: relu, TabTransformer only)
  • --epochs: Number of training epochs
  • --batch_size: Batch size
  • --lr: Learning rate

Multi-Output & Multi-Label Support

1. Multi-Output Regression

  • Labels shape: (num_samples, num_outputs) (e.g., (N, 5) for 5 regression targets)
  • Set --num_classes 5 (or your output dimension)
  • Change loss in script to:
    criterion = torch.nn.MSELoss()
    yb = torch.tensor(y[idx:idx+args.batch_size], dtype=torch.float32, device=device)
    
  • Labels dtype: float32

2. Multi-Label Classification

  • Labels shape: (num_samples, num_outputs) (e.g., (N, 5) for 5 binary labels)
  • Set --num_classes 5
  • Change loss in script to:
    criterion = torch.nn.BCEWithLogitsLoss()
    yb = torch.tensor(y[idx:idx+args.batch_size], dtype=torch.float32, device=device)
    
  • Labels dtype: float32 (with values 0 or 1)

3. Multi-Class, Multi-Output (rare)

  • Each output is a separate multi-class problem. Use a custom loss (e.g., sum of CrossEntropyLoss for each output column).

API Documentation

TabTransformer

from tabtransformer.model import TabTransformer

model = TabTransformer(
    categories=(10, 5, 6, 5, 8),   # tuple: unique values per categorical column
    num_continuous=10,             # number of continuous features
    dim=32,                       # embedding dimension
    dim_out=2,                    # output dimension (e.g., num classes or outputs)
    depth=6,                      # number of transformer layers
    heads=8,                      # number of attention heads
    attn_dropout=0.1,              # attention dropout
    ff_dropout=0.1,                # feedforward dropout
    mlp_hidden_mults=(4, 2),      # MLP hidden layer multipliers
    mlp_act=nn.ReLU()             # activation function
)

# Forward pass
out = model(x_categ, x_cont)
# x_categ: (batch, num_categ), torch.LongTensor
# x_cont: (batch, num_cont), torch.FloatTensor

FTTransformer

from fttransformer.model import FTTransformer

model = FTTransformer(
    categories=(10, 5, 6, 5, 8),   # tuple: unique values per categorical column
    num_continuous=10,             # number of continuous features
    dim=32,                       # embedding dimension
    dim_out=2,                    # output dimension (e.g., num classes or outputs)
    depth=6,                       # number of transformer layers
    heads=8,                       # number of attention heads
    attn_dropout=0.1,              # attention dropout
    ff_dropout=0.1                 # feedforward dropout
)

# Forward pass
out = model(x_categ, x_cont)
# x_categ: (batch, num_categ), torch.LongTensor
# x_cont: (batch, num_cont), torch.FloatTensor

Logging & Error Handling

  • All major operations are logged using loguru.
  • Errors are caught and logged with stack traces for easy debugging.
  • You can control log level and output by editing utils/logger.py.

Project Structure

TabTransformer/
├── tabtransformer/         # TabTransformer model
│   └── model.py
├── fttransformer/          # FTTransformer model
│   └── model.py
├── utils/                  # Utilities (logging, batching, device)
│   ├── logger.py
│   ├── device.py
│   └── batch.py
├── examples/               # Training scripts
│   ├── train_tabtransformer.py
│   └── train_fttransformer.py
├── requirements.txt
└── README.md

References


License

MIT License


Publishing to PyPI (for maintainers)

  1. Update setup.py and pyproject.toml

    • Set the package name (e.g., tabtransformer-pytorch), version, author, description, etc.
    • Make sure all dependencies are listed.
  2. Build the package

python setup.py sdist bdist_wheel
  1. Upload to PyPI

    • Install twine:
      pip install twine
      
    • Upload:
      twine upload dist/*
      
    • For test uploads, use --repository testpypi.
  2. Users can now install via

pip install tabtransformer-pytorch
  1. (Optional) Add badges, version, and PyPI link to this README.

This project is production-ready and suitable for open-source/public release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tabular_transformers-0.1.0.tar.gz (12.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tabular_transformers-0.1.0-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file tabular_transformers-0.1.0.tar.gz.

File metadata

  • Download URL: tabular_transformers-0.1.0.tar.gz
  • Upload date:
  • Size: 12.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for tabular_transformers-0.1.0.tar.gz
Algorithm Hash digest
SHA256 076e91c308a0dfeb6725baf4fbe233d3f05260b6eac9ae681bb061e472fdb255
MD5 9af833b4efc2ddaf459d8cd25ce3b3b0
BLAKE2b-256 c63b129f299833ef9edacd0ed41c718b85115f6169e2021382b60465e0732d43

See more details on using hashes here.

File details

Details for the file tabular_transformers-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for tabular_transformers-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 af56e0937854fa42eac4aa75f507a379b7ed63c1da13d9cbc0d257e0c15b9314
MD5 b73a4b4772878f0793ed51aab292f7cd
BLAKE2b-256 9f66b36c3e0e2e10066501b349dd358456882aae1113c2b4a0f6e9715965d3b1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page