Evolved universal tabular feature map + closed-form ridge: an interpretable, training-free, local in-context learner for tabular data.
Project description
tabmap — EvoForest-Tab: an evolved universal tabular feature map
tabmap is the reference implementation of EvoForest-Tab (the EvoForest computation-search framework specialized to tabular data).
tabmap is an interpretable, training-free, local in-context learner for tabular data: an
evolved universal feature map φ: row → ℝᴷ (16 transform families over rank-gauss, count-encoding,
and categorical-mask channels) paired with a per-dataset closed-form Bayesian-ridge head. Given a
labeled support set and an unlabeled query set, it predicts in a single SVD solve — no gradient
descent, no per-dataset tuning, no GPU. It is competitive with gradient boosting and with the
published TabPFN-v2 tabular foundation model, while remaining free to run and fully inspectable.
This repository accompanies the paper "Evolving a Universal Tabular Feature Map: Interpretable,
Closed-Form In-Context Learning Competitive with Tabular Foundation Models" and is stand-alone:
the deployment pipeline (feature map + ridge) depends only on torch, numpy, and pyyaml.
Install
pip install -e . # editable; or: pip install .
# deps: torch, numpy, pyyaml (+ scikit-learn for the estimator base classes & examples)
Usage (scikit-learn style)
from evoforest_tab import TabMapClassifier, TabMapRegressor
clf = TabMapClassifier(n_estimators=6).fit(X_support, y_support) # X: ndarray or DataFrame
proba = clf.predict_proba(X_query) # in-context: query needed to fit φ channels
pred = clf.predict(X_query)
reg = TabMapRegressor(n_estimators=6).fit(X_support, y_support)
yhat = reg.predict(X_query)
Notes:
- It is an in-context learner:
predictbuilds the (label-free, transductive) channels over the pooled support+query rows, so the query rows are needed at prediction time (as with TabPFN). n_estimatorsis the random-feature ensemble size (averaged decorrelated seed-variants ofφ);n_estimators=1is the single map,6is the paper default (variance reduction toward the kernel limit).cat_features=[...]marks categorical columns (indices or DataFrame names); omitted → auto-detected.- No class-count ceiling (unlike TabPFN-v2's ≤10 classes); runs on CPU in milliseconds.
What's inside
tabmap/
_channels.py raw rows -> input channels (col-z, rank-gauss, count-encoding, categorical mask), nan-safe
_genome.py evaluate the evolved genome (champion.yaml) -> feature matrix Phi; seed-variants for the ensemble
_ridge.py closed-form Bayesian-ridge head (evidence-maximized lambda), single SVD solve
estimator.py TabMapClassifier / TabMapRegressor (sklearn API) + K-seed ensemble
champion.yaml the evolved 16-family genome (the deployment artifact)
examples/quickstart.py
reproduce/ scripts + cached TabPFN-v2 predictions to reproduce the paper's experiments
tests/
Reproducing the paper
See reproduce/README.md. The cached TabPFN-v2 cloud predictions are included
so the head-to-head and routing experiments reproduce without any API key.
Contributing this method upstream
tabmap is designed to drop into the tabular ML ecosystem. Best integration targets (most aligned first):
| Repo | Why it fits | Integration |
|---|---|---|
| PriorLabs/tabpfn-extensions | community extensions around TabPFN; our method is a free/local complementary in-context learner and a natural cost-aware router companion (route hard datasets to TabPFN, the rest to tabmap) |
add as an extension module + a routing utility (sklearn-compatible) |
| scikit-learn-contrib | TabMapClassifier/TabMapRegressor already follow the estimator API |
publish as a standalone scikit-learn-contrib project |
| skrub (ex dirty-cat) | tabular feature engineering / encoders; our channels (rank-gauss, count-encoding) + φ are a drop-in TransformerMixin featurizer |
contribute TabMapEncoder (transform-only) |
| pyg-team/pytorch-frame | deep tabular; φ is a fixed featurizer usable as an input stem |
add as an encoder/stype transform |
| autogluon / TabArena | leaderboard model implementations | submit tabmap as a model for the TabArena living benchmark |
The estimator's sklearn-compatible surface (fit/predict/predict_proba, get_params) is the
contribution-ready API; the transform-only build_channels+build_phi path serves the encoder use-cases.
Combining with a foundation model (e.g. TabPFN)
StackedTabularEnsemble combines TabMap with any in-context base model (such as TabPFN's client) into a
single, stronger predictor -- the paper's complementarity result (our map tends to win classification,
TabPFN regression; combining beats either alone). Three methods: blend (50/50), compwt
(label-free, weight each model by its support-cross-validated competence), meta (a learned ridge head
over the models' out-of-fold support predictions; most robust). All are leakage-safe and in-context
(weights/head fit on support, no query labels).
from evoforest_tab import TabMapClassifier, StackedTabularEnsemble
from tabpfn_client import TabPFNClassifier # or any sklearn-surface in-context model
ens = StackedTabularEnsemble(
[TabMapClassifier(n_estimators=6), TabPFNClassifier()],
task="classification", method="meta", # "meta" | "compwt" | "blend"
).fit(X_support, y_support)
proba = ens.predict_proba(X_query)
The learned head (meta) is robust whether the two models are evenly matched or one dominates; the
label-free compwt is a close, deployable second with no meta-learner. See examples/combine_tabpfn.py.
Citation
If you use this library, please cite the accompanying paper "Evolving a Universal Tabular Feature Map:
Interpretable, Closed-Form In-Context Learning Competitive with Tabular Foundation Models." (anonymized
for review; see ../tabular_paper/).
License
MIT (see LICENSE).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file evoforest_tab-0.1.0.tar.gz.
File metadata
- Download URL: evoforest_tab-0.1.0.tar.gz
- Upload date:
- Size: 23.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73642392e77ea0e9ba4f6e224399eb308b288b34995aec92eb1e0df04cba4379
|
|
| MD5 |
89a8576dd2eb9833e0af29a800ab9786
|
|
| BLAKE2b-256 |
0afc5b2c7e677678eea0754512f008fb73b04886a8c892b0b0c85662f3070374
|
File details
Details for the file evoforest_tab-0.1.0-py3-none-any.whl.
File metadata
- Download URL: evoforest_tab-0.1.0-py3-none-any.whl
- Upload date:
- Size: 23.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c4fd37b5657aa97cf674f6b4210d9f3f92a1c135999709141001b878078dbf1b
|
|
| MD5 |
b2cb68c0e32ca68fbf37391cce92a484
|
|
| BLAKE2b-256 |
ff9ee1cf9f6c94610d0ba4be0800d894e74ae8738b97fbd6cc935107ea5b9c7d
|