Skip to main content

Evolved universal tabular feature map + closed-form ridge: an interpretable, training-free, local in-context learner for tabular data.

Project description

tabmap — EvoForest-Tab: an evolved universal tabular feature map

tabmap is the reference implementation of EvoForest-Tab (the EvoForest computation-search framework specialized to tabular data).

tabmap is an interpretable, training-free, local in-context learner for tabular data: an evolved universal feature map φ: row → ℝᴷ (16 transform families over rank-gauss, count-encoding, and categorical-mask channels) paired with a per-dataset closed-form Bayesian-ridge head. Given a labeled support set and an unlabeled query set, it predicts in a single SVD solve — no gradient descent, no per-dataset tuning, no GPU. It is competitive with gradient boosting and with the published TabPFN-v2 tabular foundation model, while remaining free to run and fully inspectable.

This repository accompanies the paper "Evolving a Universal Tabular Feature Map: Interpretable, Closed-Form In-Context Learning Competitive with Tabular Foundation Models" and is stand-alone: the deployment pipeline (feature map + ridge) depends only on torch, numpy, and pyyaml.

Install

pip install -e .            # editable; or: pip install .
# deps: torch, numpy, pyyaml  (+ scikit-learn for the estimator base classes & examples)

Usage (scikit-learn style)

from evoforest_tab import TabMapClassifier, TabMapRegressor

clf = TabMapClassifier(n_estimators=6).fit(X_support, y_support)   # X: ndarray or DataFrame
proba = clf.predict_proba(X_query)                                  # in-context: query needed to fit φ channels
pred  = clf.predict(X_query)

reg = TabMapRegressor(n_estimators=6).fit(X_support, y_support)
yhat = reg.predict(X_query)

Notes:

  • It is an in-context learner: predict builds the (label-free, transductive) channels over the pooled support+query rows, so the query rows are needed at prediction time (as with TabPFN).
  • n_estimators is the random-feature ensemble size (averaged decorrelated seed-variants of φ); n_estimators=1 is the single map, 6 is the paper default (variance reduction toward the kernel limit).
  • cat_features=[...] marks categorical columns (indices or DataFrame names); omitted → auto-detected.
  • No class-count ceiling (unlike TabPFN-v2's ≤10 classes); runs on CPU in milliseconds.

What's inside

tabmap/
  _channels.py   raw rows  -> input channels (col-z, rank-gauss, count-encoding, categorical mask), nan-safe
  _genome.py     evaluate the evolved genome (champion.yaml) -> feature matrix Phi; seed-variants for the ensemble
  _ridge.py      closed-form Bayesian-ridge head (evidence-maximized lambda), single SVD solve
  estimator.py   TabMapClassifier / TabMapRegressor (sklearn API) + K-seed ensemble
  champion.yaml  the evolved 16-family genome (the deployment artifact)
examples/quickstart.py
reproduce/        scripts + cached TabPFN-v2 predictions to reproduce the paper's experiments
tests/

Reproducing the paper

See reproduce/README.md. The cached TabPFN-v2 cloud predictions are included so the head-to-head and routing experiments reproduce without any API key.

Contributing this method upstream

tabmap is designed to drop into the tabular ML ecosystem. Best integration targets (most aligned first):

Repo Why it fits Integration
PriorLabs/tabpfn-extensions community extensions around TabPFN; our method is a free/local complementary in-context learner and a natural cost-aware router companion (route hard datasets to TabPFN, the rest to tabmap) add as an extension module + a routing utility (sklearn-compatible)
scikit-learn-contrib TabMapClassifier/TabMapRegressor already follow the estimator API publish as a standalone scikit-learn-contrib project
skrub (ex dirty-cat) tabular feature engineering / encoders; our channels (rank-gauss, count-encoding) + φ are a drop-in TransformerMixin featurizer contribute TabMapEncoder (transform-only)
pyg-team/pytorch-frame deep tabular; φ is a fixed featurizer usable as an input stem add as an encoder/stype transform
autogluon / TabArena leaderboard model implementations submit tabmap as a model for the TabArena living benchmark

The estimator's sklearn-compatible surface (fit/predict/predict_proba, get_params) is the contribution-ready API; the transform-only build_channels+build_phi path serves the encoder use-cases.

Combining with a foundation model (e.g. TabPFN)

StackedTabularEnsemble combines TabMap with any in-context base model (such as TabPFN's client) into a single, stronger predictor -- the paper's complementarity result (our map tends to win classification, TabPFN regression; combining beats either alone). Three methods: blend (50/50), compwt (label-free, weight each model by its support-cross-validated competence), meta (a learned ridge head over the models' out-of-fold support predictions; most robust). All are leakage-safe and in-context (weights/head fit on support, no query labels).

from evoforest_tab import TabMapClassifier, StackedTabularEnsemble
from tabpfn_client import TabPFNClassifier            # or any sklearn-surface in-context model

ens = StackedTabularEnsemble(
        [TabMapClassifier(n_estimators=6), TabPFNClassifier()],
        task="classification", method="meta",          # "meta" | "compwt" | "blend"
      ).fit(X_support, y_support)
proba = ens.predict_proba(X_query)

The learned head (meta) is robust whether the two models are evenly matched or one dominates; the label-free compwt is a close, deployable second with no meta-learner. See examples/combine_tabpfn.py.

Citation

If you use this library, please cite the accompanying paper "Evolving a Universal Tabular Feature Map: Interpretable, Closed-Form In-Context Learning Competitive with Tabular Foundation Models." (anonymized for review; see ../tabular_paper/).

License

MIT (see LICENSE).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evoforest_tab-0.1.0.tar.gz (23.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

evoforest_tab-0.1.0-py3-none-any.whl (23.1 kB view details)

Uploaded Python 3

File details

Details for the file evoforest_tab-0.1.0.tar.gz.

File metadata

  • Download URL: evoforest_tab-0.1.0.tar.gz
  • Upload date:
  • Size: 23.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for evoforest_tab-0.1.0.tar.gz
Algorithm Hash digest
SHA256 73642392e77ea0e9ba4f6e224399eb308b288b34995aec92eb1e0df04cba4379
MD5 89a8576dd2eb9833e0af29a800ab9786
BLAKE2b-256 0afc5b2c7e677678eea0754512f008fb73b04886a8c892b0b0c85662f3070374

See more details on using hashes here.

File details

Details for the file evoforest_tab-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: evoforest_tab-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 23.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for evoforest_tab-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c4fd37b5657aa97cf674f6b4210d9f3f92a1c135999709141001b878078dbf1b
MD5 b2cb68c0e32ca68fbf37391cce92a484
BLAKE2b-256 ff9ee1cf9f6c94610d0ba4be0800d894e74ae8738b97fbd6cc935107ea5b9c7d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page