Skip to main content

iLTM: Integrated Large Tabular Model

Project description

iLTM: Integrated Large Tabular Model

PyPI version License Downloads Python Versions Hugging Face

iLTM is a foundation model for tabular data that integrates tree-derived embeddings, dimensionality-agnostic representations, a meta-trained hypernetwork, multilayer perceptron (MLP) neural networks, and retrieval. iLTM automatically handles feature scaling, categorical features, and missing values.

We release open weights of pre-trained model checkpoints that consistently achieve superior performance across tabular classification and regression tasks, from small to large and high-dimensional tasks.

iLTM architecture diagram

Install

iLTM is accessed through Python. You can install the package via pip:

pip install iltm

iLTM works on Linux, macOS and Windows, and can be executed on CPU and GPU, although GPU is highly recommended for faster execution.

Pre-trained model checkpoints are automatically downloaded from Hugging Face on first use. By default, checkpoints are stored in platform-specific cache directories (e.g., ~/.cache/iltm on Linux, ~/Library/Caches/iltm on macOS). You can specify where model checkpoints are stored by setting the ILTM_CKPT_DIR environment variable:

export ILTM_CKPT_DIR=/path/to/checkpoints

[!NOTE] The first call to iLTMRegressor or iLTMClassifier downloads the selected checkpoint. Later runs reuse the cached weights from ILTM_CKPT_DIR or the default cache location.

[!TIP] For interactive work on a local machine it is often worth pointing ILTM_CKPT_DIR to a fast local disk to avoid repeated downloads across environments.

Quick Start

iLTM is designed to be easy to use, with an API similar to scikit-learn.

from iltm import iLTMRegressor, iLTMClassifier

# Regression
reg = iLTMRegressor().fit(X_train, y_train)
y_pred = reg.predict(X_test)

# Classification
clf = iLTMClassifier().fit(X_train, y_train)
proba = clf.predict_proba(X_test)
y_hat = clf.predict(X_test)

# With time limit (returns partial ensemble if time runs out)
reg = iLTMRegressor().fit(X_train, y_train, fit_max_time=3600)  # 1 hour limit

Model Checkpoints

Available checkpoint names:

  • "xgbrconcat" (default): Robust preprocessing + XGBoost embeddings + concatenation
  • "cbrconcat": Robust preprocessing + CatBoost embeddings + concatenation
  • "r128bn": Robust preprocessing with 128-dim bottleneck
  • "rnobn": Robust preprocessing without bottleneck
  • "xgb": XGBoost embeddings only
  • "catb": CatBoost embeddings only
  • "rtr": Robust preprocessing with retrieval
  • "rtrcb": CatBoost embeddings with retrieval

You can also provide a local path to a checkpoint file.

Common key args:

  • checkpoint: checkpoint name or path to model file. Default "xgbrconcat".
  • device: torch device string. Default "cuda:0".
  • n_ensemble: number of generated predictors.
  • batch_size: batch size for weight prediction and inference.
  • preprocessing: "realmlp_td_s_v0" or "minimal" or "none".
  • cat_features: list of categorical column indices.
  • tree_embedding: enable GBDT leaf embeddings.
  • tree_model: "XGBoost_hist" or "CatBoost".
  • concat_tree_with_orig_features: concatenate original features with embeddings.
  • finetuning: end to end finetuning.
  • Retrieval: do_retrieval, retrieval_alpha, retrieval_temperature, retrieval_distance.

Regressor only:

  • clip_predictions: clip to train target range.
  • normalize_predictions: z-normalize outputs before unscaling.

Classifier only:

  • voting: "soft" or "hard".

Hyperparameter Optimization

iLTM performs best when you tune its hyperparameters.

Recommended search space

The package exposes a recommended search space via iltm.get_hyperparameter_search_space, a plain dictionary that maps hyperparameter names to small specs.

[!TIP] When running hyperparameter optimization with time constraints, you can use the fit_max_time parameter in fit() to limit training time per configuration. The model will return a partial ensemble if the time limit is reached.

The checkpoint parameter is part of this space. It is responsible for selecting one of the built in model checkpoints, which in turn sets other fields such as preprocessing, tree_embedding, and others.

The specification format is intentionally minimal so that it can be re-used in any hyperparameter optimization library or custom tuning procedure.

  • iltm.get_hyperparameter_search_space() gives you the canonical space definition.
  • iltm.sample_hyperparameters(rng) draws a single random configuration from that space for quick baselines and smoke tests.

[!TIP] sample_hyperparameters is mainly intended for quick baselines, smoke tests, or simple random search. For more serious tuning runs it is usually better to adapt the search space from get_hyperparameter_search_space into your optimization method of choice, and let that method decide which configurations to try.

Development

Running Tests

To run the tests:

uv pip install -e ".[dev]"
pytest tests/

Citation

If you use iLTM in your research, please cite our paper:

@article{iltm2025,
  title={iLTM: Integrated Large Tabular Model},
  author={Bonet, David and Comajoan Cara, Marçal and Calafell, Alvaro and Mas Montserrat, Daniel and Ioannidis, Alexander G.},
  journal={arXiv preprint},
  year={2025}
}

Note: update with the actual publication details once available.

License

© Contributors, 2025. Licensed under an Apache-2 license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iltm-0.1.0.tar.gz (152.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iltm-0.1.0-py3-none-any.whl (64.7 kB view details)

Uploaded Python 3

File details

Details for the file iltm-0.1.0.tar.gz.

File metadata

  • Download URL: iltm-0.1.0.tar.gz
  • Upload date:
  • Size: 152.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for iltm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b1a082cc68b73b80160672881443b43d1b6200d93589098e44cf9da05bcd0b01
MD5 c6f7224e0d0772eebf8bfbc5d9f4a8ce
BLAKE2b-256 89ecc448afc8620490f6a690d77e07355c066725f3e535b34cf071995f00cd39

See more details on using hashes here.

File details

Details for the file iltm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: iltm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 64.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for iltm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0b6fe03410cee8104b2ccb1842a5458f086c3ee67eb19afbe43d77963603a308
MD5 1bb837b9f8c9f46679beb7cffd9a06be
BLAKE2b-256 5a9ea7a509de00c7d90a494476861f5951ef21d41763d4e01fc7a2101238f970

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page