Skip to main content

Adaptation of TabPFN to work with large tabular datasets.

Project description

Ensemble TabPFN

TabPFN is a transformer architecture prosposed by Hollman et al for classification on small tabular datasets. It is a Prior-Data Fitted Network that has been trained once and does not require fine tuning for new datasets. It works by approximating the distribution of new data to the prior synthetic data it has seen during training. In a machine learning pipeline, this network can be "fit" on a training dataset in under a second and can generate predictions for the test set in a single forward pass in the network. However there are limitations in the current architecture, namely, the training dataset can contain only upto 1000 inputs with upto 100 numerical features. In addition, the network can predict only upto 10 classes in a multi-class classification problem. With EnsembleTabPFN, we address two of these issues where we have extended the original model to work with datasets containing more than 1000 samples and 100 features. EnsembleTabPFN is fully compatible with Scikit-learn API and can be used in a modelling pipeline.

Installation

From source

git clone https://github.com/ersilia-os/ensemble-tabpfn.git
cd ensemble-tabpfn
pip install .

From PyPI

pip install ensemble-tabpfn

Using Poetry

git clone https://github.com/ersilia-os/ensemble-tabpfn.git
cd ensemble-tabpfn
poetry install --without dev,test,docs

Usage

from ensemble_tabpfn import EnsembleTabPFN
from sklearn.metrics import accuracy_score

clf = EnsembleTabPFN()
clf.fit(X_train, y_train)
y_hat = clf.predict(y_test)
acc = accuracy_score(y_test, y_hat)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ensemble_tabpfn-0.1.1.tar.gz (17.9 kB view details)

Uploaded Source

Built Distribution

ensemble_tabpfn-0.1.1-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file ensemble_tabpfn-0.1.1.tar.gz.

File metadata

  • Download URL: ensemble_tabpfn-0.1.1.tar.gz
  • Upload date:
  • Size: 17.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.7.14 Linux/5.19.0-32-generic

File hashes

Hashes for ensemble_tabpfn-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a94941b44812430d49d7aa190d4d61f50bf5ec65c09b1db699c3440f2f1cbff4
MD5 d280d96de0d34d3246563b7a62d64d97
BLAKE2b-256 34d90fa0e03e134296fbbfd9c880dd02187df725d55fa8ec428652f9ae2cb28a

See more details on using hashes here.

File details

Details for the file ensemble_tabpfn-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: ensemble_tabpfn-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 19.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.7.14 Linux/5.19.0-32-generic

File hashes

Hashes for ensemble_tabpfn-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7d9abdc544bf86974dae8aa8aa1adaa44ed847254c051ba3890fbca7d4863641
MD5 832f0fe4a6378af9d1979c3452e8149f
BLAKE2b-256 9768f244f0654bd8f82d84dd970874e91fb7019281d87d5c229d4375a45bf89f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page