Skip to main content

CARTE-AI: Context Aware Representation of Table Entries for AI

Project description

Downloads PyPI Version Python Version Code Style: Black License: MIT

CARTE:
Pretraining and Transfer for Tabular Learning

CARTE_outline

This repository contains the implementation of the paper CARTE: Pretraining and Transfer for Tabular Learning.

CARTE is a pretrained model for tabular data by treating each table row as a star graph and training a graph transformer on top of this representation.

Colab Examples (Give it a test):

Open In Colab

  • CARTERegressor on Wine Poland dataset
  • CARTEClassifier on Spotify dataset

01 Install 🚀

The library has been tested on Linux, MacOSX and Windows.

CARTE-AI can be installed from PyPI:

pip install carte-ai

Post installation check

After a correct installation, you should be able to import the module without errors:

import carte_ai

02 CARTE-AI example on sampled data step by step ➡️

1️⃣ Load the Data 💽

import pandas as pd
from carte_ai.data.load_data import *

num_train = 128  # Example: set the number of training groups/entities
random_state = 1  # Set a random seed for reproducibility
X_train, X_test, y_train, y_test = wina_pl(num_train, random_state)
print("Wina Poland dataset:", X_train.shape, X_test.shape)

sample

2️⃣ Convert Table 2 Graph 🪵

The basic preparations are:

  • preprocess raw data
  • load the prepared data and configs; set train/test split
  • generate graphs for each table entries (rows) using the Table2GraphTransformer
  • create an estimator and make inference
import fasttext
from huggingface_hub import hf_hub_download
from carte_ai import Table2GraphTransformer

model_path = hf_hub_download(repo_id="hi-paris/fastText", filename="cc.en.300.bin")

preprocessor = Table2GraphTransformer(fasttext_model_path=model_path)

# Fit and transform the training data
X_train = preprocessor.fit_transform(X_train, y=y_train)

# Transform the test data
X_test = preprocessor.transform(X_test)

sample

3️⃣ Make Predictions🔮

For learning, CARTE currently runs with the sklearn interface (fit/predict) and the process is:

  • Define parameters
  • Set the estimator
  • Run 'fit' to train the model and 'predict' to make predictions
from carte_ai import CARTERegressor, CARTEClassifier

# Define some parameters
fixed_params = dict()
fixed_params["num_model"] = 10 # 10 models for the bagging strategy
fixed_params["disable_pbar"] = False # True if you want cleanness
fixed_params["random_state"] = 0
fixed_params["device"] = "cpu"
fixed_params["n_jobs"] = 10
fixed_params["pretrained_model_path"] = config_directory["pretrained_model"]


# Define the estimator and run fit/predict

estimator = CARTERegressor(**fixed_params) # CARTERegressor for Regression
estimator.fit(X=X_train, y=y_train)
y_pred = estimator.predict(X_test)

# Obtain the r2 score on predictions

score = r2_score(y_test, y_pred)
print(f"\nThe R2 score for CARTE:", "{:.4f}".format(score))

sample

03 Reproducing paper results ⚙️

➡️ installation instructions setup paper

04 Contribute to the package 🚀

➡️ read the contributions guidelines

05 CARTE-AI references 📚

@article{kim2024carte,
  title={CARTE: pretraining and transfer for tabular learning},
  author={Kim, Myung Jun and Grinsztajn, L{\'e}o and Varoquaux, Ga{\"e}l},
  journal={arXiv preprint arXiv:2402.16785},
  year={2024}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

carte_ai-0.0.20.tar.gz (40.3 MB view details)

Uploaded Source

Built Distribution

carte_ai-0.0.20-py3-none-any.whl (40.3 MB view details)

Uploaded Python 3

File details

Details for the file carte_ai-0.0.20.tar.gz.

File metadata

  • Download URL: carte_ai-0.0.20.tar.gz
  • Upload date:
  • Size: 40.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.14

File hashes

Hashes for carte_ai-0.0.20.tar.gz
Algorithm Hash digest
SHA256 74a5e34123a61b7d6082a828cb285b405d8cdb73569cc4fd7f89b5109b15aa5c
MD5 fb59246ef0eb7aa7a788b511a7dfa800
BLAKE2b-256 72b55565d153c4ff7cbec5da91c54b16eecca2625e839502f22ad6637860dd33

See more details on using hashes here.

File details

Details for the file carte_ai-0.0.20-py3-none-any.whl.

File metadata

  • Download URL: carte_ai-0.0.20-py3-none-any.whl
  • Upload date:
  • Size: 40.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.14

File hashes

Hashes for carte_ai-0.0.20-py3-none-any.whl
Algorithm Hash digest
SHA256 724ed376509937c8ef322e8962cecf9f62051ebec8e90dc32f91dc8a4dd82bc9
MD5 956151dab5fcce5511cd09dd297b763c
BLAKE2b-256 40438d24b6d341538546f01fd3f593c1d31dbff015fb7396200130009cb62c05

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page