Skip to main content

TabTune: A Unified Library for Inference and Fine-Tuning Tabular Foundation Models

Project description

TabTune Logo

TabTune - A Unified Library for Inference and Fine-Tuning Tabular Foundation Models

Python 3.11+ PyTorch Documentation arXiv Discord

A powerful and flexible Python library designed to simplify the training and fine-tuning of modern foundation models on tabular data.

Provides a high-level, scikit-learn-compatible API that abstracts away the complexities of data preprocessing and model-specific training loops, allowing you to focus on results.


🚀 Core Features

The library is built on four main components that work together seamlessly:

  • DataProcessor -- A smart, model-aware data preparation engine.
    Automatically handles imputation, scaling, and categorical encoding based on the requirements of the selected model (e.g., integer encoding for TabPFN, text embeddings for ContextTab).

  • TuningManager -- The computational core of the library.
    Manages the model adaptation process, applying the correct training strategy—whether it's zero-shot inference, episodic fine-tuning for ICL models, or full fine-tuning with optional PEFT (Parameter-Efficient Fine-Tuning).

  • TabularPipeline -- The main user-facing object.
    Provides simple yet efficient functionalities - .fit(), .predict(), .evaluate(), .save(), and .load() API that chains all components into a seamless, end-to-end experience.

  • TabularLeaderboard -- A leaderboard utility for model comparison.
    Makes it easy to compare multiple models and strategies on the same dataset splits with automatic ranking and metric reporting.


🤔 Why TabTune?

Using diverse tabular foundation models often requires writing model-specific boilerplate for data preparation, training, and inference. TabTune solves this by providing:

  • Unified API: A single, consistent interface (.fit(), .predict(), .evaluate()) for multiple models like TabPFN, TabICL, Mitra, ContextTab, TabDPT, OrionMSP, and OrionBix.

  • Automated Preprocessing: The DataProcessor is model-aware, automatically applying the correct transformations without manual configuration.

  • Flexible Fine-Tuning Strategies:

    • Inference mode for zero-shot predictions
    • Meta-learning mode for episodic fine-tuning (recommended for ICL models)
    • Supervised Fine-Tuning (SFT) for task-optimized learning
    • PEFT mode for parameter-efficient adaptation using LoRA adapters
  • Easy Model Comparison: The TabularLeaderboard allows you to benchmark multiple models and strategies to quickly find the best performer.

  • Checkpoint Management: Automatic saving and loading of fine-tuned model weights with support for resuming training.


📊 Supported Models

TabTune has built-in support for a growing list of powerful tabular models, each with its own specialized preprocessing and tuning pipeline handled automatically.

Model Family / Paradigm Key Innovation Supported Strategies
TabPFN-v2 PFN / ICL Approximates Bayesian inference on synthetic data Inference, Meta-Learning FT, SFT, PEFT*
TabICL Scalable ICL Two-stage column-then-row attention Inference, Meta-Learning FT, SFT, PEFT
OrionMSP Scalable ICL Multi-Scale Sparse Attention for Tabular In-Context Learning Inference, Meta-Learning FT, SFT, PEFT
OrionBix Scalable ICL Tabular BiAxial In-Context Learning with biaxial attention mechanism Inference, Meta-Learning FT, SFT, PEFT
Mitra Scalable ICL 2D attention (row & column); mixed synthetic priors Inference, Meta-Learning FT, SFT, PEFT
ContextTab Semantics-Aware ICL Modality-specific semantic embeddings Inference, Full Fine-Tuning, PEFT*
TabDPT Denoising Transformer Denoising pre-training for feature representation Inference, Meta-Learning FT, SFT, PEFT

Note: PEFT for ContextTab and TabPFN is experimental; 'base-ft' strategy is fully supported.


⚙️ Installation

git clone https://github.com/Lexsi-Labs/TabTune.git
cd TabTune
pip install -r requirements.txt
pip install -e .

⚡ Quick Start: End-to-End Workflow

Here is a complete example of loading a dataset, fine-tuning a TabPFN model, saving the pipeline, and making predictions.

import pandas as pd
from sklearn.model_selection import train_test_split
import openml
from tabtune.TabularPipeline.pipeline import TabularPipeline

# 1. Load a dataset from OpenML
dataset = openml.datasets.get_dataset(42178)
X, y, _, _ = dataset.get_data(target=dataset.default_target_attribute)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

# 2. Configure and Initialize the Pipeline
pipeline = TabularPipeline(
    model_name="TabPFN",
    task_type="classification",
    tuning_strategy="inference",  # or 'finetune', 'base-ft', 'peft'
    tuning_params={"device": "cpu"}
)

# 3. Fit the pipeline on the raw training data
pipeline.fit(X_train, y_train)

# 4. Save the fine-tuned pipeline
pipeline.save("fitted_pipeline.joblib")

# 5. Load the pipeline and make predictions on new data
loaded_pipeline = TabularPipeline.load("fitted_pipeline.joblib")
predictions = loaded_pipeline.predict(X_test)

# 6. Evaluate the pipeline
metrics = pipeline.evaluate(X_test, y_test)
print(metrics)

🎯 Tuning Strategies

TabTune provides multiple fine-tuning strategies to suit different use cases:

Inference Mode

Zero-shot predictions without any training. The model uses its pre-trained weights directly on your data.

pipeline = TabularPipeline(
    model_name="TabPFN",
    tuning_strategy="inference"
)
pipeline.fit(X_train, y_train)
predictions = pipeline.predict(X_test)

Base Fine-Tuning (base-ft)

Full parameter fine-tuning. Updates all model weights using task data.

  • Meta-Learning (default for ICL models): Episodic training that mimics the in-context learning paradigm
  • SFT (Supervised Fine-Tuning): Standard supervised training on batches
pipeline = TabularPipeline(
    model_name="TabICL",
    tuning_strategy="finetune",  # Defaults to 'base-ft'
    tuning_params={
        "epochs": 5,
        "learning_rate": 1e-5,
        "finetune_mode": "meta-learning"  # or "sft"
    }
)
pipeline.fit(X_train, y_train)

PEFT Mode (Parameter-Efficient Fine-Tuning)

Applies LoRA (Low-Rank Adaptation) adapters to only a subset of parameters, reducing memory and computation.

pipeline = TabularPipeline(
    model_name="TabICL",
    tuning_strategy="peft",
    tuning_params={
        "epochs": 10,
        "learning_rate": 5e-5,
        "peft_config": {
            "r": 8,
            "lora_alpha": 16,
            "lora_dropout": 0.05
        }
    }
)
pipeline.fit(X_train, y_train)

PEFT Support by Model:

  • Full Support: TabICL, OrionMSP, OrionBix, TabDPT, Mitra
  • ⚠️ Experimental: ContextTab and TabPFN (may cause prediction issues; use 'base-ft' instead)

📊 Evaluation Metrics

When calling .evaluate(), TabTune computes the following metrics:

  • Accuracy -- Fraction of correct predictions
  • Weighted F1 Score -- Harmonic mean of precision and recall, weighted by class support
  • ROC AUC Score -- Area under the Receiver Operating Characteristic curve (binary and multi-class supported)
  • Matthews Correlation Coefficient (MCC) -- Correlation between predicted and actual values
  • Precision & Recall -- Per-class performance metrics
  • Brier Score -- Mean squared error of probabilistic predictions
metrics = pipeline.evaluate(X_test, y_test)
print(metrics)
# Output: {'accuracy': 0.92, 'f1_score': 0.89, 'roc_auc_score': 0.95, ...}

🏆 Model Comparison with TabularLeaderboard

The TabularLeaderboard makes it easy to compare multiple models and strategies on the same dataset.

from tabtune.TabularLeaderboard.leaderboard import TabularLeaderboard

# 1. Initialize the leaderboard with your data splits
leaderboard = TabularLeaderboard(X_train, X_test, y_train, y_test)

# 2. Add model configurations to compare
leaderboard.add_model(
    model_name='TabICL',
    tuning_strategy='inference',
    model_params={'n_estimators': 16}
)

leaderboard.add_model(
    model_name='TabICL',
    tuning_strategy='finetune',
    model_params={'n_estimators': 16},
    tuning_params={'epochs': 5, 'learning_rate': 1e-5, 'finetune_mode': 'meta-learning'}
)

leaderboard.add_model(
    model_name='TabPFN',
    tuning_strategy='inference'
)

# 3. Run the benchmark and display ranked results
leaderboard.run()

🛠️ API Reference

TabularPipeline Constructor

TabularPipeline(
    model_name: str,
    task_type: str = 'classification',
    tuning_strategy: str = 'inference',
    tuning_params: dict | None = None,
    processor_params: dict | None = None,
    model_params: dict | None = None,
    model_checkpoint_path: str | None = None,
    finetune_mode: str = 'meta-learning'
)

Key Parameters:

  • model_name (str): The name of the model to use (e.g., 'TabPFN', 'TabICL', 'ContextTab', 'Mitra', 'TabDPT', 'OrionMSP', 'OrionBix').

  • task_type (str): The type of task, either 'classification' or 'regression' (currently only classification is fully supported).

  • tuning_strategy (str): The strategy for model adaptation ('inference', 'finetune', 'base-ft', or 'peft').

  • tuning_params (dict, optional): Parameters for the TuningManager:

    • epochs (int): Number of training epochs
    • learning_rate (float): Learning rate for optimization
    • batch_size (int): Batch size for fine-tuning
    • device (str): 'cuda' or 'cpu'
    • save_checkpoint_path (str): Path to save fine-tuned weights
    • checkpoint_dir (str): Directory for automatic checkpoint saving
    • finetune_mode (str): 'meta-learning' or 'sft' (episodic vs. supervised)
    • peft_config (dict): Configuration for LoRA adapters
    • show_progress (bool): Whether to show progress bars
  • processor_params (dict, optional): Parameters for the DataProcessor:

    • imputation_strategy (str): 'mean', 'median', 'iterative', 'knn'
    • categorical_encoding (str): 'onehot', 'ordinal', 'target', 'hashing', 'binary'
    • scaling_strategy (str): 'standard', 'minmax', 'robust', 'power_transform'
    • resampling_strategy (str): 'smote', 'random_over', 'random_under', 'tomek', 'kmeans', 'knn'
    • feature_selection_strategy (str): 'variance', 'select_k_best_anova', 'select_k_best_chi2'
  • model_params (dict, optional): Model-specific parameters.

  • model_checkpoint_path (str, optional): Path to a .pt file containing pre-trained model weights.

  • finetune_mode (str, optional): Default fine-tuning mode. Can be overridden in tuning_params.


💾 Checkpoint Management

Automatic Checkpoint Saving

Fine-tuned models are automatically saved during training:

tuning_params = {
    'save_checkpoint_path': './checkpoints/my_model.pt',
    'checkpoint_dir': './checkpoints'  # Used if save_checkpoint_path is None
}

Manual Checkpoint Loading

# Load pre-trained weights when initializing
pipeline = TabularPipeline(
    model_name="TabPFN",
    model_checkpoint_path="./checkpoints/pretrained.pt"
)

Pipeline Serialization

# Save entire pipeline
pipeline.save("my_pipeline.joblib")

# Load and use
loaded_pipeline = TabularPipeline.load("my_pipeline.joblib")
predictions = loaded_pipeline.predict(X_test)

🔧 PEFT/LoRA Configuration

LoRA (Low-Rank Adaptation) adapters can significantly reduce memory usage during fine-tuning.

peft_config = {
    'r': 8,                   # LoRA rank (lower = fewer parameters)
    'lora_alpha': 16,         # Scaling factor for LoRA updates
    'lora_dropout': 0.05,     # Dropout in LoRA modules
    'target_modules': None    # Auto-detect by model (optional override)
}

pipeline = TabularPipeline(
    model_name="TabICL",
    tuning_strategy="peft",
    tuning_params={
        'epochs': 10,
        'learning_rate': 5e-5,
        'peft_config': peft_config
    }
)

Memory Savings: PEFT typically reduces memory usage by 60-80% compared to full fine-tuning.


📚 Example Notebooks

Below are 9 Example Notebooks showcasing all the features of the Library in-depth!

Serial No. Name Task Performed Link To Notebook
1 Unified API Showcasing A Unified API Across Multiple Models Open In Colab
2 Automated Model-Aware Preprocessing The Automated preprocessing system explained Open In Colab
3 Fine-Tuning Strategies TabTune's four fine-tuning strategies Open In Colab
4 Model Comparison Model Comparison with TabularLeaderboard Open In Colab
5 Checkpoint Management Checkpoint Management - Save/Load Pipelines Open In Colab
6 Advanced Usage PEFT Configuration and Hybrid Strategies Open In Colab
7 Data Sampling Data Sampling and Resampling Strategies for Inference Open In Colab
8 Evaluation Metrics Evaluation Metrics involved Open In Colab
9 Benchmarking Standard Benchmarking Techniques Open In Colab

🚀 Advanced Usage

Custom Preprocessing

Override default preprocessing for specific needs:

processor_params = {
    'imputation_strategy': 'iterative',
    'categorical_encoding': 'target',
    'scaling_strategy': 'robust',
    'resampling_strategy': 'smote'
}

pipeline = TabularPipeline(
    model_name="TabICL",
    processor_params=processor_params
)

Hybrid Fine-Tuning

Combine meta-learning with PEFT for optimal results:

pipeline = TabularPipeline(
    model_name="TabICL",
    tuning_strategy="peft",
    tuning_params={
        'epochs': 20,
        'learning_rate': 1e-5,
        'finetune_mode': 'meta-learning',
        'peft_config': {
            'r': 16,
            'lora_alpha': 32,
            'lora_dropout': 0.1
        }
    }
)

📖 Documentation

For detailed documentation, API reference, model configurations, and usage examples, please visit: Documentation


🙏 Acknowledgments

TabTune is built upon the excellent work of the following projects and research teams:

  • OrionMSP - Multi-Scale Sparse Attention for Tabular In-Context Learning
  • OrionBix - Tabular BiAxial In-Context Learnin
  • TabPFN - Prior-data Fitted Networks for tabular data
  • TabICL - Tabular In-Context Learning with scalable attention
  • Mitra (Tab2D) - 2D Attention mechanism (Tab2D) for tabular data, included within AutoGluon
  • ContextTab - Semantics-Aware In-Context Learning for Tabular Data
  • TabDPT - Denoising Pre-training Transformer for Tabular Data
  • AutoGluon - AutoML framework that inspired our unified API design

🐛 Troubleshooting

Out of Memory (OOM) Errors

  • Reduce batch_size in tuning_params
  • Use tuning_strategy='peft' for PEFT mode
  • Decrease n_ensembles or context_size for inference

PEFT Compatibility Issues

  • Some models have experimental PEFT support; use 'base-ft' strategy instead
  • Check logs for model-specific warnings

Device Mismatch

  • Ensure device parameter matches your hardware (cuda/cpu)
  • Use torch.cuda.is_available() to check GPU availability

🗃️ License

This project is released under the MIT License.
Please cite appropriately if used in academic or production projects.

Citation:

@misc{tanna2025tabtuneunifiedlibraryinference,
      title={TabTune: A Unified Library for Inference and Fine-Tuning Tabular Foundation Models}, 
      author={Aditya Tanna and Pratinav Seth and Mohamed Bouadi and Utsav Avaiya and Vinay Kumar Sankarapu},
      year={2025},
      eprint={2511.02802},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2511.02802}, 
}

📫 Join Community / Contribute

  • Issues and discussions are welcomed on the GitHub issue tracker and Discord .
  • Please see the Contributing section for contribution standards, code reviews, and documentation tips.

Contact

Lexsi Labs Logo

Website: https://lexsi.ai

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tabtune-0.1.0.tar.gz (527.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tabtune-0.1.0-py3-none-any.whl (602.7 kB view details)

Uploaded Python 3

File details

Details for the file tabtune-0.1.0.tar.gz.

File metadata

  • Download URL: tabtune-0.1.0.tar.gz
  • Upload date:
  • Size: 527.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for tabtune-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5f5b7c1593d81170c7f04e3be9fdf07a5d2e511e9de731a807b9dabf3ffa6897
MD5 9f3bda2eb3fc69f5247595df7fd155e8
BLAKE2b-256 c7c7921ed1e05e6a31408bb4da1b4a8156114bb6320ee94c4ff19e850d1abf5d

See more details on using hashes here.

File details

Details for the file tabtune-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tabtune-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 602.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for tabtune-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a06f81d2aca26c74e7ff69782ef904ba603d6403430d82e5ef683da9854d1ada
MD5 1ae9527bad762f6fc0f5e6fefda31e2b
BLAKE2b-256 48290537e36dbda9826362192f13cd3d9c0cae7ff4f9a0c72c25c25d017a2891

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page