Skip to main content

A PyTorch-based library for self- and semi-supervised learning tabular models.

Project description

TabularS3L

A PyTorch-based library for self- and semi-supervised learning tabular models. Currently, VIME, SubTab and SCARF are available.

To DO

  • Launch nn.Module and Dataset of VIME, SubTab, and SCARF
    • VIME
    • SubTab
    • SCARF
  • Launch pytorch lightning modules of VIME, SubTab, and SCARF
    • VIME
    • SubTab
    • SCARF
  • Finish README.md

Installation

We provide a Python package ts3l of TabularS3L for users who want to use semi- and self-supervised learning tabular models.

pip install ts3l

How to use?

# Assume we have X_train and y_train. And assume that we also have X_unlabeled for self-supervised learning

from ts3l.models import SCARF
from ts3l.utils.scarf_utils import SCARFDataset
from ts3l.utils.scarf_utils import NTXentLoss

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import RandomSampler, WeightedRandomSampler, Dataset, DataLoader

emb_dim = 128
encoder_depth = 4
head_depth = 2
corruption_rate = 6
dropout_rate = 0.15

batch_size = 128

model = SCARF(input_dim = X_train.shape[1],
        emb_dim = emb_dim,
        encoder_depth = encoder_depth,
        head_depth = head_depth,
        dropout_rate = dropout_rate,
        out_dim = 2)
pretraining_loss = NTXentLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

train_ds = SCARFDataset(X_train.append(X_unlabeled), corruption_len=int(corruption_rate * X_train.shape[1]))

model.do_pretraining() # Now, model.forward conducts self-superivsed learning.

train_dl = DataLoader(train_ds, 
                        batch_size = batch_size, 
                        shuffle=False, 
                        sampler = RandomSampler(train_ds),
                        num_workers=4,
                        drop_last=True)

for epoch in range(2): 
    for i, data in enumerate(train_dl, 0):
        
        optimizer.zero_grad()

        x, x_corrupted = data
        emb_anchor, emb_corrupted = model(x, x_corrupted)

        loss = pretraining_loss(emb_anchor, emb_corrupted)

        loss.backward()
        optimizer.step()


train_ds = SCARFDataset(X_train, y_train.values)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

model.do_finetunning() # Now, model.forward conducts (semi-)superivsed learning.

train_dl = DataLoader(train_ds, 
                        batch_size = batch_size, 
                        shuffle=False, 
                        sampler = WeightedRandomSampler(train_ds.weights, num_samples = len(train_ds)),
                        num_workers=4,
                        drop_last=True)

for epoch in range(2): 
    for i, data in enumerate(train_dl, 0):
        
        optimizer.zero_grad()

        x, y = data
        y_hat = model(x)

        loss = criterion(y_hat, y)

        loss.backward()
        optimizer.step()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ts3l-0.10.tar.gz (13.1 kB view hashes)

Uploaded Source

Built Distribution

ts3l-0.10-py3-none-any.whl (16.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page