TabTune: A Unified Library for Inference and Fine-Tuning Tabular Foundation Models

These details have not been verified by PyPI

Project links

Project description

TabTune - A Unified Library for Inference and Fine-Tuning Tabular Foundation Models

A powerful and flexible Python library designed to simplify the training and fine-tuning of modern foundation models on tabular data.

Provides a high-level, scikit-learn-compatible API that abstracts away the complexities of data preprocessing and model-specific training loops, allowing you to focus on results.

🚀 Core Features

The library is built on four main components that work together seamlessly:

DataProcessor -- A smart, model-aware data preparation engine.
Automatically handles imputation, scaling, and categorical encoding based on the requirements of the selected model (e.g., integer encoding for TabPFN, text embeddings for ContextTab).
TuningManager -- The computational core of the library.
Manages the model adaptation process, applying the correct training strategy—whether it's zero-shot inference, episodic fine-tuning for ICL models, or full fine-tuning with optional PEFT (Parameter-Efficient Fine-Tuning).
TabularPipeline -- The main user-facing object.
Provides simple yet efficient functionalities - .fit(), .predict(), .evaluate(), .save(), and .load() API that chains all components into a seamless, end-to-end experience.
TabularLeaderboard -- A leaderboard utility for model comparison.
Makes it easy to compare multiple models and strategies on the same dataset splits with automatic ranking and metric reporting.

🤔 Why TabTune?

Using diverse tabular foundation models often requires writing model-specific boilerplate for data preparation, training, and inference. TabTune solves this by providing:

Unified API: A single, consistent interface (.fit(), .predict(), .evaluate()) for multiple models such as TabPFN, TabPFNv2.6, TabICL, TabICLv2, Mitra, ContextTab, TabDPT, OrionMSP, and OrionBix.
Automated Preprocessing: The DataProcessor is model-aware, automatically applying the correct transformations without manual configuration.
Flexible Fine-Tuning Strategies:
- Inference mode for zero-shot predictions
- Meta-learning mode for episodic fine-tuning (recommended for ICL models)
- Supervised Fine-Tuning (SFT) for task-optimized learning
- PEFT mode for parameter-efficient adaptation using LoRA adapters
Easy Model Comparison: The TabularLeaderboard allows you to benchmark multiple models and strategies to quickly find the best performer.
Checkpoint Management: Automatic saving and loading of fine-tuned model weights with support for resuming training.

🚀 What's New in this release

✅ TabPFNv2.6 Integration -- Full support for the latest TabPFN release, covering classification and regression (inference + finetune), with a dedicated native fine-tuning mode (finetune_mode='native') that leverages FinetunedTabPFNClassifier / FinetunedTabPFNRegressor.
✅ TabICLv2 Integration -- Full support for TabICLv2 for both classification (inference + finetune) and regression (inference + finetune), using episodic turn-by-turn fine-tuning.

📊 Supported Models

Model	Family / Paradigm	Key Innovation	Supported Strategies
TabPFN-v2	PFN / ICL	Approximates Bayesian inference on synthetic data	Inference, Meta-Learning FT, SFT, PEFT*, Regression, Regression FT
TabICL	Scalable ICL	Two-stage column-then-row attention	Inference, Meta-Learning FT, SFT, PEFT
OrionMSP v1.0	Scalable ICL	Multi-Scale Sparse Attention	Inference, Meta-Learning FT, SFT, PEFT
OrionMSP v1.5	Scalable ICL	Stabilized prototype refinement	Inference, Meta-Learning FT, SFT, PEFT
OrionBix	Scalable ICL	Tabular Bi-Axial In-Context Learning	Inference, Meta-Learning FT, SFT, PEFT
Mitra	Scalable ICL	2D attention (row & column)	Inference, Meta-Learning FT, SFT, PEFT, Regression, Regression-FT
ContextTab	Semantics-Aware ICL	Modality-specific semantic embeddings	Inference, Full Fine-Tuning, PEFT*, Regression, Regression-FT
TabDPT	Denoising Transformer	Denoising pre-training	Inference, Meta-Learning FT, SFT, Regression, Regression-FT
LimiX	Probabilistic / ICL	Likelihood-based mixture modeling; uncertainty-aware	Inference, Regression, Regression-FT
TabPFN-v2.6	PFN / ICL	Latest PriorLabs release with native finetuning API	Inference, Meta-Learning FT, SFT, Native FT, Regression, Regression FT
TabICLv2	Scalable ICL	Improved column-then-row attention	Inference, FT, Regression, Regression FT

Note: PEFT for ContextTab and TabPFN is experimental; inference strategy is fully supported.

⚙️ Installation

git clone https://github.com/Lexsi-Labs/TabTune.git
cd TabTune
pip install -r requirements.txt
pip install -e .

⚡ Quick Start: End-to-End Workflow

Here is a complete example of loading a dataset, fine-tuning a TabPFN model, saving the pipeline, and making predictions.

import pandas as pd
from sklearn.model_selection import train_test_split
import openml
from tabtune.TabularPipeline.pipeline import TabularPipeline

# 1. Load a dataset from OpenML
dataset = openml.datasets.get_dataset(42178)
X, y, _, _ = dataset.get_data(target=dataset.default_target_attribute)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

# 2. Configure and Initialize the Pipeline
pipeline = TabularPipeline(
    model_name="TabPFN",
    task_type="classification",
    tuning_strategy="inference",  # or 'finetune'
    tuning_params={"device": "cpu"}
)

# 3. Fit the pipeline on the raw training data
pipeline.fit(X_train, y_train)

# 4. Save the fine-tuned pipeline
pipeline.save("fitted_pipeline.joblib")

# 5. Load the pipeline and make predictions on new data
loaded_pipeline = TabularPipeline.load("fitted_pipeline.joblib")
predictions = loaded_pipeline.predict(X_test)

# 6. Evaluate the pipeline
metrics = pipeline.evaluate(X_test, y_test)
print(metrics)

🎯 Tuning Strategies

TabTune provides multiple fine-tuning strategies to suit different use cases:

Inference Mode

Zero-shot predictions without any training. The model uses its pre-trained weights directly on your data.

pipeline = TabularPipeline(
    model_name="TabPFN",
    tuning_strategy="inference"
)
pipeline.fit(X_train, y_train)
predictions = pipeline.predict(X_test)

Base Fine-Tuning (`base-ft`)

Full parameter fine-tuning. Updates all model weights using task data.

Meta-Learning (default for ICL models): Episodic training that mimics the in-context learning paradigm
SFT (Supervised Fine-Tuning): Standard supervised training on batches

pipeline = TabularPipeline(
    model_name="TabICL",
    tuning_strategy="finetune",  # Defaults to 'base-ft'
    tuning_params={
        "epochs": 5,
        "learning_rate": 1e-5,
        "finetune_mode": "meta-learning"  # or "sft"
    }
)
pipeline.fit(X_train, y_train)

Native Fine-Tuning (TabPFNv2.6 only)

TabPFNv2.6 exposes PriorLabs' FinetunedTabPFNClassifier / FinetunedTabPFNRegressor directly, offering their native advanced fine-tuning pipeline.

# Classification
pipeline = TabularPipeline(
    model_name="TabPFNv26",
    task_type="classification",
    tuning_strategy="finetune",
    finetune_mode="native",         # uses FinetunedTabPFNClassifier
    tuning_params={
        "epochs": 30,
        "learning_rate": 1e-5,
        "early_stopping": True,
        "early_stopping_patience": 8,
    }
)
pipeline.fit(X_train, y_train)

# Regression
pipeline = TabularPipeline(
    model_name="TabPFNv26",
    task_type="regression",
    tuning_strategy="finetune",
    finetune_mode="native",         # uses FinetunedTabPFNRegressor
    tuning_params={
        "epochs": 30,
        "learning_rate": 1e-5,
        "early_stopping": True,
    }
)
pipeline.fit(X_train, y_train)

PEFT Mode (Parameter-Efficient Fine-Tuning)

Applies LoRA (Low-Rank Adaptation) adapters to only a subset of parameters, reducing memory and computation.

pipeline = TabularPipeline(
    model_name="TabICL",
    tuning_strategy="peft",
    tuning_params={
        "epochs": 10,
        "learning_rate": 5e-5,
        "peft_config": {
            "r": 8,
            "lora_alpha": 16,
            "lora_dropout": 0.05
        }
    }
)
pipeline.fit(X_train, y_train)

PEFT Support by Model:

✅ Full Support: TabICL, OrionMSP, OrionBix, TabDPT, Mitra
⚠️ Experimental: ContextTab and TabPFN (may cause prediction issues; use 'base-ft' instead)

📊 Evaluation Metrics

When calling .evaluate(), TabTune computes the following metrics:

Accuracy -- Fraction of correct predictions
Weighted F1 Score -- Harmonic mean of precision and recall, weighted by class support
ROC AUC Score -- Area under the Receiver Operating Characteristic curve (binary and multi-class supported)
Matthews Correlation Coefficient (MCC) -- Correlation between predicted and actual values
Precision & Recall -- Per-class performance metrics
Brier Score -- Mean squared error of probabilistic predictions

metrics = pipeline.evaluate(X_test, y_test)
print(metrics)
# Output: {'accuracy': 0.92, 'f1_score': 0.89, 'roc_auc_score': 0.95, ...}

📈 Using Regression in TabTune

TabTune now fully supports regression tasks with standardized evaluation metrics.

Example: Housing Price Prediction

from tabtune import TabularPipeline
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

X, y = fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

pipeline = TabularPipeline(
    model_name="OrionMSP",
    task_type="regression",
    tuning_strategy="inference",
    tuning_params={
        "epochs": 5,
        "learning_rate": 2e-5
    }
)

pipeline.fit(X_train, y_train)
metrics = pipeline.evaluate(X_test, y_test)

print(metrics)

Supported Regression Metrics

RMSE
MAE
R² Score

🔁 Resampling & Context Sampling (Fine-Tuning)

TabTune provides two complementary mechanisms for handling data imbalance and episodic construction:

Dataset-Level Resampling (via DataProcessor)
Context / Support-Query Sampling (for meta-learning models)

Both integrate seamlessly into TabularPipeline.

✅ Supported Resampling Strategies

Strategy Description Task Support

smote Synthetic minority oversampling Classification random_over Random oversampling Classification random_under Random undersampling Classification tomek Tomek links cleaning Classification kmeans KMeans-SMOTE hybrid Classification knn KNN-based synthetic sampling Classification

Resampling is primarily designed for imbalanced classification tasks.

Resampling in Action

Resampling is configured through processor_params and is applied before training. An example usage is as follows :-

from tabtune import TabularPipeline

pipeline = TabularPipeline(
    model_name="TabICL",
    tuning_strategy="inference",
    processor_params={
        "resampling_strategy": "smote"
    },
    tuning_params={
        "epochs": 5,
        "learning_rate": 2e-5
    }
)

pipeline.fit(X_train, y_train)

🏆 Model Comparison with TabularLeaderboard

The TabularLeaderboard makes it easy to compare multiple models and strategies on the same dataset.

from tabtune.TabularLeaderboard.leaderboard import TabularLeaderboard

# 1. Initialize the leaderboard with your data splits
leaderboard = TabularLeaderboard(X_train, X_test, y_train, y_test)

# 2. Add model configurations to compare
leaderboard.add_model(
    model_name='TabICL',
    tuning_strategy='inference',
    model_params={'n_estimators': 16}
)

leaderboard.add_model(
    model_name='TabICL',
    tuning_strategy='finetune',
    model_params={'n_estimators': 16},
    tuning_params={'epochs': 5, 'learning_rate': 1e-5, 'finetune_mode': 'meta-learning'}
)

leaderboard.add_model(
    model_name='TabPFN',
    tuning_strategy='inference'
)

# 3. Run the benchmark and display ranked results
leaderboard.run()

🛠️ API Reference

TabularPipeline Constructor

TabularPipeline(
    model_name: str,
    task_type: str = 'classification',
    tuning_strategy: str = 'inference',
    tuning_params: dict | None = None,
    processor_params: dict | None = None,
    model_params: dict | None = None,
    model_checkpoint_path: str | None = None,
    finetune_mode: str = 'meta-learning'
)

Key Parameters:

model_name (str): The name of the model to use. Supported values: 'TabPFN', 'TabPFNv26', 'TabICL', 'TabICLv2', 'ContextTab', 'Mitra', 'TabDPT', 'OrionMSP', 'OrionMSPv1.5', 'OrionBix', 'Limix'.
task_type (str): The type of task — 'classification' or 'regression'.
tuning_strategy (str): The strategy for model adaptation: 'inference', 'finetune', or 'peft'.
finetune_mode (str, optional): Controls the fine-tuning algorithm. If None, a smart default is chosen per task type ('turn_by_turn' for regression, 'meta-learning' for classification). Supported values per model:
- 'meta-learning' — episodic meta-learning (TabICL, TabICLv2, OrionMSP, OrionBix, TabDPT, Mitra, TabPFNv26)
- 'sft' — supervised fine-tuning (TabPFN, TabPFNv26, Mitra, TabDPT)
- 'native' — PriorLabs native finetuner with bar distribution loss, AMP, early stopping (TabPFNv2.6 only, classification and regression)
- 'turn_by_turn' / 'tbt' — episodic turn-by-turn (TabPFN regression, Mitra regression, TabDPT regression, ContextTab regression)
tuning_params (dict, optional): Parameters for the TuningManager:
- epochs (int): Number of training epochs
- learning_rate (float): Learning rate for optimization
- batch_size (int): Batch size for fine-tuning
- device (str): 'cuda' or 'cpu'
- save_checkpoint_path (str): Path to save fine-tuned weights
- checkpoint_dir (str): Directory for automatic checkpoint saving
- show_progress (bool): Whether to show progress bars
- peft_config (dict): Configuration for LoRA adapters
- early_stopping (bool): Enable early stopping — TabPFNv2.6 native mode only
- early_stopping_patience (int): Patience for early stopping — TabPFNv2.6 native mode only
- n_estimators_finetune (int): Ensemble size during fine-tuning — TabPFNv2.6 native mode only
processor_params (dict, optional): Parameters for the DataProcessor:
- imputation_strategy (str): 'mean', 'median', 'iterative', 'knn'
- categorical_encoding (str): 'onehot', 'ordinal', 'target', 'hashing', 'binary'
- scaling_strategy (str): 'standard', 'minmax', 'robust', 'power_transform'
- resampling_strategy (str): 'smote', 'random_over', 'random_under', 'tomek', 'kmeans', 'knn'
- feature_selection_strategy (str): 'variance', 'select_k_best_anova', 'select_k_best_chi2'
model_params (dict, optional): Model-specific parameters.
model_checkpoint_path (str, optional): Path to a .pt file containing pre-trained model weights.

💾 Checkpoint Management

Automatic Checkpoint Saving

Fine-tuned models are automatically saved during training:

tuning_params = {
    'save_checkpoint_path': './checkpoints/my_model.pt',
    'checkpoint_dir': './checkpoints'  # Used if save_checkpoint_path is None
}

Manual Checkpoint Loading

# Load pre-trained weights when initializing
pipeline = TabularPipeline(
    model_name="TabPFN",
    model_checkpoint_path="./checkpoints/pretrained.pt"
)

Pipeline Serialization

# Save entire pipeline
pipeline.save("my_pipeline.joblib")

# Load and use
loaded_pipeline = TabularPipeline.load("my_pipeline.joblib")
predictions = loaded_pipeline.predict(X_test)

🔧 PEFT/LoRA Configuration

LoRA (Low-Rank Adaptation) adapters can significantly reduce memory usage during fine-tuning.

peft_config = {
    'r': 8,                   # LoRA rank (lower = fewer parameters)
    'lora_alpha': 16,         # Scaling factor for LoRA updates
    'lora_dropout': 0.05,     # Dropout in LoRA modules
    'target_modules': None    # Auto-detect by model (optional override)
}

pipeline = TabularPipeline(
    model_name="TabICL",
    tuning_strategy="peft",
    tuning_params={
        'epochs': 10,
        'learning_rate': 5e-5,
        'peft_config': peft_config
    }
)

Memory Savings: PEFT typically reduces memory usage by 60-80% compared to full fine-tuning.

🏆 Example Notebooks

|Below are 13 Example Notebooks showcasing all the features of the Library in-depth!

Serial No.	Name	Task Performed
1	Unified API	Showcasing A Unified API Across Multiple Models
2	Automated Model-Aware Preprocessing	The Automated preprocessing system explained
3	Fine-Tuning Strategies	TabTune's four fine-tuning strategies
4	Model Comparison	Model Comparison with TabularLeaderboard
5	Checkpoint Management	Checkpoint Management - Save/Load Pipelines
6	Advanced Usage	PEFT Configuration and Hybrid Strategies
7	Resampling	Resampling Strategies
8	Regression - 1	Introduction to Regression - Inference
9	Regression - 2	Introduction to Regression - Finetune
10	Evaluation Metrics	Evaluation Metrics involved
11	Benchmarking	Standard Benchmarking Techniques
12	TabPFNv2.6	TabPFNv2.6 — Classification and Regression
13	TabICLv2	TabICLv2 — Classification and Regression

🚀 Advanced Usage

Custom Preprocessing

Override default preprocessing for specific needs:

processor_params = {
    'imputation_strategy': 'iterative',
    'categorical_encoding': 'target',
    'scaling_strategy': 'robust',
    'resampling_strategy': 'smote'
}

pipeline = TabularPipeline(
    model_name="TabICL",
    processor_params=processor_params
)

Hybrid Fine-Tuning

Combine meta-learning with PEFT for optimal results:

pipeline = TabularPipeline(
    model_name="TabICL",
    tuning_strategy="peft",
    tuning_params={
        'epochs': 20,
        'learning_rate': 1e-5,
        'finetune_mode': 'meta-learning',
        'peft_config': {
            'r': 16,
            'lora_alpha': 32,
            'lora_dropout': 0.1
        }
    }
)

📖 Documentation

For detailed documentation, API reference, model configurations, and usage examples, please visit: Documentation

Acknowledgments

TabTune is built upon the excellent work of the following projects and research teams:

OrionMSP1.0/1.5 - Multi-Scale Sparse Attention for Tabular In-Context Learning
OrionBix - Tabular BiAxial In-Context Learnin
TabPFN - Prior-data Fitted Networks for tabular data
TabICL - Tabular In-Context Learning with scalable attention
Mitra (Tab2D) - 2D Attention mechanism (Tab2D) for tabular data, included within AutoGluon
ContextTab - Semantics-Aware In-Context Learning for Tabular Data
TabDPT - Denoising Pre-training Transformer for Tabular Data
AutoGluon - AutoML framework that inspired our unified API design
LimiX – Likelihood-based mixture modeling and probabilistic inference framework for structured tabular learning

🐛 Troubleshooting

Out of Memory (OOM) Errors

Reduce batch_size in tuning_params
Use tuning_strategy='peft' for PEFT mode
Decrease n_ensembles or context_size for inference

PEFT Compatibility Issues

Some models have experimental PEFT support; use 'base-ft' strategy instead
Check logs for model-specific warnings

Device Mismatch

Ensure device parameter matches your hardware (cuda/cpu)
Use torch.cuda.is_available() to check GPU availability

🗃️ License

This project is released under the MIT License.
Please cite appropriately if used in academic or production projects.

Citation:

@misc{tanna2025tabtuneunifiedlibraryinference,
      title={TabTune: A Unified Library for Inference and Fine-Tuning Tabular Foundation Models}, 
      author={Aditya Tanna and Pratinav Seth and Mohamed Bouadi and Utsav Avaiya and Vinay Kumar Sankarapu},
      year={2025},
      eprint={2511.02802},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2511.02802}, 
}

📫 Join Community / Contribute

Issues and discussions are welcomed on the GitHub issue tracker and Discord .
Please see the Contributing section for contribution standards, code reviews, and documentation tips.

Contact

https://www.lexsi.ai

Paris 🇫🇷 · Mumbai 🇮🇳 · London 🇬🇧

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.16

May 20, 2026

This version

0.1.14

Apr 2, 2026

0.1.12

Mar 16, 2026

0.1.9

Mar 5, 2026

0.1.6

Feb 26, 2026

0.1.5

Jan 6, 2026

0.1.4

Nov 25, 2025

0.1.3

Nov 24, 2025

0.1.2

Nov 12, 2025

0.1.0

Nov 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tabtune-0.1.14.tar.gz (1.1 MB view details)

Uploaded Apr 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tabtune-0.1.14-py3-none-any.whl (1.2 MB view details)

Uploaded Apr 2, 2026 Python 3

File details

Details for the file tabtune-0.1.14.tar.gz.

File metadata

Download URL: tabtune-0.1.14.tar.gz
Upload date: Apr 2, 2026
Size: 1.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for tabtune-0.1.14.tar.gz
Algorithm	Hash digest
SHA256	`2450f87195c0e13fe7a15544673d9267004a4ca55ec2194de4af6acdf7b6ea6d`
MD5	`750a9e629f4d61307b5e699946af683e`
BLAKE2b-256	`a79326533860beb6138e12f412e8ca30d860306b6bc14d229a794a42fa8f4f82`

See more details on using hashes here.

File details

Details for the file tabtune-0.1.14-py3-none-any.whl.

File metadata

Download URL: tabtune-0.1.14-py3-none-any.whl
Upload date: Apr 2, 2026
Size: 1.2 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for tabtune-0.1.14-py3-none-any.whl
Algorithm	Hash digest
SHA256	`31191dfd1477a4546d660bc533e08d2f24e4426659ea6fd6f0904b172b62be07`
MD5	`c7c6538b01cf1ea6a10af024c7c33306`
BLAKE2b-256	`9da3e0fbde0a1a6e03e85a3b43c25b023e199b5f1c2735b7b87064d7b4421daa`

See more details on using hashes here.

tabtune 0.1.14

Navigation

Verified details

Owner

Unverified details

Project links

Meta

Classifiers

Project description

TabTune - A Unified Library for Inference and Fine-Tuning Tabular Foundation Models

🚀 Core Features

🤔 Why TabTune?

🚀 What's New in this release

📊 Supported Models

⚙️ Installation

⚡ Quick Start: End-to-End Workflow

🎯 Tuning Strategies

Inference Mode

Base Fine-Tuning (base-ft)

Native Fine-Tuning (TabPFNv2.6 only)

PEFT Mode (Parameter-Efficient Fine-Tuning)

📊 Evaluation Metrics

📈 Using Regression in TabTune

Example: Housing Price Prediction

Supported Regression Metrics

🔁 Resampling & Context Sampling (Fine-Tuning)

✅ Supported Resampling Strategies

Resampling in Action

🏆 Model Comparison with TabularLeaderboard

🛠️ API Reference

TabularPipeline Constructor

Key Parameters:

💾 Checkpoint Management

Automatic Checkpoint Saving

Manual Checkpoint Loading

Pipeline Serialization

🔧 PEFT/LoRA Configuration

🏆 Example Notebooks

🚀 Advanced Usage

Custom Preprocessing

Hybrid Fine-Tuning

📖 Documentation

Acknowledgments

🐛 Troubleshooting

Out of Memory (OOM) Errors

PEFT Compatibility Issues

Device Mismatch

🗃️ License

📫 Join Community / Contribute

Contact

Project details

Verified details

Owner

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Base Fine-Tuning (`base-ft`)