CARTE-AI: Context Aware Representation of Table Entries for AI
Project description
CARTE:
Pretraining and Transfer for Tabular Learning
This repository contains the implementation of the paper CARTE: Pretraining and Transfer for Tabular Learning.
CARTE is a pretrained model for tabular data by treating each table row as a star graph and training a graph transformer on top of this representation.
Colab Examples (Give it a test):
- CARTERegressor on Wine Poland dataset
- CARTEClassifier on Spotify dataset
01 Install 🚀
The library has been tested on Linux, MacOSX and Windows.
CARTE-AI can be installed from PyPI:
pip install carte-ai
Post installation check
After a correct installation, you should be able to import the module without errors:
import carte_ai
02 CARTE-AI example on sampled data step by step ➡️
1️⃣ Load the Data 💽
import pandas as pd
from carte_ai.data.load_data import *
num_train = 128 # Example: set the number of training groups/entities
random_state = 1 # Set a random seed for reproducibility
X_train, X_test, y_train, y_test = wina_pl(num_train, random_state)
print("Wina Poland dataset:", X_train.shape, X_test.shape)
2️⃣ Convert Table 2 Graph 🪵
The basic preparations are:
- preprocess raw data
- load the prepared data and configs; set train/test split
- generate graphs for each table entries (rows) using the Table2GraphTransformer
- create an estimator and make inference
import fasttext
from huggingface_hub import hf_hub_download
from carte_ai import Table2GraphTransformer
model_path = hf_hub_download(repo_id="hi-paris/fastText", filename="cc.en.300.bin")
preprocessor = Table2GraphTransformer(fasttext_model_path=model_path)
# Fit and transform the training data
X_train = preprocessor.fit_transform(X_train, y=y_train)
# Transform the test data
X_test = preprocessor.transform(X_test)
3️⃣ Make Predictions🔮
For learning, CARTE currently runs with the sklearn interface (fit/predict) and the process is:
- Define parameters
- Set the estimator
- Run 'fit' to train the model and 'predict' to make predictions
from carte_ai import CARTERegressor, CARTEClassifier
# Define some parameters
fixed_params = dict()
fixed_params["num_model"] = 10 # 10 models for the bagging strategy
fixed_params["disable_pbar"] = False # True if you want cleanness
fixed_params["random_state"] = 0
fixed_params["device"] = "cpu"
fixed_params["n_jobs"] = 10
fixed_params["pretrained_model_path"] = config_directory["pretrained_model"]
# Define the estimator and run fit/predict
estimator = CARTERegressor(**fixed_params) # CARTERegressor for Regression
estimator.fit(X=X_train, y=y_train)
y_pred = estimator.predict(X_test)
# Obtain the r2 score on predictions
score = r2_score(y_test, y_pred)
print(f"\nThe R2 score for CARTE:", "{:.4f}".format(score))
03 CARTE-AI references 📚
@article{kim2024carte,
title={CARTE: pretraining and transfer for tabular learning},
author={Kim, Myung Jun and Grinsztajn, L{\'e}o and Varoquaux, Ga{\"e}l},
journal={arXiv preprint arXiv:2402.16785},
year={2024}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file carte_ai-0.0.14.tar.gz
.
File metadata
- Download URL: carte_ai-0.0.14.tar.gz
- Upload date:
- Size: 40.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0594b27a0945f2ac1e73ac8247cdaf07ebbaea2f6e8fd56998aeab1f7cf53bf3 |
|
MD5 | 0a67db037d7415027969f8a8f0ba7936 |
|
BLAKE2b-256 | 0abd48a9ee0c9bafafc7c08f8c0055b2f28346255a98e20566c720b118f4d7aa |
File details
Details for the file carte_ai-0.0.14-py3-none-any.whl
.
File metadata
- Download URL: carte_ai-0.0.14-py3-none-any.whl
- Upload date:
- Size: 40.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2af1f92db279bccedf0fa36b0d7c341589c96aba82414ea36b644fc1c6d06008 |
|
MD5 | 11a17d9cda87982c1f7c72a6bb5ceb25 |
|
BLAKE2b-256 | 2972637a0af88cd7c96b91cc955f7514f71b8d9f6cada7151ddc91dd265885af |