Adaptation of TabPFN to work with large tabular datasets.
Project description
Ensemble TabPFN
TabPFN is a transformer architecture prosposed by Hollman et al for classification on small tabular datasets. It is a Prior-Data Fitted Network that has been trained once and does not require fine tuning for new datasets. It works by approximating the distribution of new data to the prior synthetic data it has seen during training. In a machine learning pipeline, this network can be "fit" on a training dataset in under a second and can generate predictions for the test set in a single forward pass in the network. However there are limitations in the current architecture, namely, the training dataset can contain only upto 1000 inputs with upto 100 numerical features. In addition, the network can predict only upto 10 classes in a multi-class classification problem. With EnsembleTabPFN, we address two of these issues where we have extended the original model to work with datasets containing more than 1000 samples and 100 features. EnsembleTabPFN is fully compatible with Scikit-learn API and can be used in a modelling pipeline.
Installation
From source
git clone https://github.com/ersilia-os/ensemble-tabpfn.git
cd ensemble-tabpfn
pip install .
From PyPI
pip install ensemble-tabpfn
Using Poetry
git clone https://github.com/ersilia-os/ensemble-tabpfn.git
cd ensemble-tabpfn
poetry install --without dev,test,docs
Usage
from ensemble_tabpfn import EnsembleTabPFN
from sklearn.metrics import accuracy_score
clf = EnsembleTabPFN()
clf.fit(X_train, y_train)
y_hat = clf.predict(y_test)
acc = accuracy_score(y_test, y_hat)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ensemble_tabpfn-0.1.1.tar.gz
.
File metadata
- Download URL: ensemble_tabpfn-0.1.1.tar.gz
- Upload date:
- Size: 17.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.7.14 Linux/5.19.0-32-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a94941b44812430d49d7aa190d4d61f50bf5ec65c09b1db699c3440f2f1cbff4 |
|
MD5 | d280d96de0d34d3246563b7a62d64d97 |
|
BLAKE2b-256 | 34d90fa0e03e134296fbbfd9c880dd02187df725d55fa8ec428652f9ae2cb28a |
File details
Details for the file ensemble_tabpfn-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: ensemble_tabpfn-0.1.1-py3-none-any.whl
- Upload date:
- Size: 19.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.7.14 Linux/5.19.0-32-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7d9abdc544bf86974dae8aa8aa1adaa44ed847254c051ba3890fbca7d4863641 |
|
MD5 | 832f0fe4a6378af9d1979c3452e8149f |
|
BLAKE2b-256 | 9768f244f0654bd8f82d84dd970874e91fb7019281d87d5c229d4375a45bf89f |