A PyTorch-based library for self- and semi-supervised learning tabular models.
Project description
TabularS3L
A PyTorch-based library for self- and semi-supervised learning tabular models. Currently, VIME, SubTab and SCARF are available.
To DO
- Launch nn.Module and Dataset of VIME, SubTab, and SCARF
- VIME
- SubTab
- SCARF
- Launch pytorch lightning modules of VIME, SubTab, and SCARF
- VIME
- SubTab
- SCARF
- Finish README.md
Installation
We provide a Python package ts3l of TabularS3L for users who want to use semi- and self-supervised learning tabular models.
pip install ts3l
How to use?
# Assume we have X_train and y_train. And assume that we also have X_unlabeled for self-supervised learning
from ts3l.models import SCARF
from ts3l.utils.scarf_utils import SCARFDataset
from ts3l.utils.scarf_utils import NTXentLoss
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import RandomSampler, WeightedRandomSampler, Dataset, DataLoader
emb_dim = 128
encoder_depth = 4
head_depth = 2
corruption_rate = 6
dropout_rate = 0.15
batch_size = 128
model = SCARF(input_dim = X_train.shape[1],
emb_dim = emb_dim,
encoder_depth = encoder_depth,
head_depth = head_depth,
dropout_rate = dropout_rate,
out_dim = 2)
pretraining_loss = NTXentLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
train_ds = SCARFDataset(X_train.append(X_unlabeled), corruption_len=int(corruption_rate * X_train.shape[1]))
model.do_pretraining() # Now, model.forward conducts self-superivsed learning.
train_dl = DataLoader(train_ds,
batch_size = batch_size,
shuffle=False,
sampler = RandomSampler(train_ds),
num_workers=4,
drop_last=True)
for epoch in range(2):
for i, data in enumerate(train_dl, 0):
optimizer.zero_grad()
x, x_corrupted = data
emb_anchor, emb_corrupted = model(x, x_corrupted)
loss = pretraining_loss(emb_anchor, emb_corrupted)
loss.backward()
optimizer.step()
train_ds = SCARFDataset(X_train, y_train.values)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
model.do_finetunning() # Now, model.forward conducts (semi-)superivsed learning.
train_dl = DataLoader(train_ds,
batch_size = batch_size,
shuffle=False,
sampler = WeightedRandomSampler(train_ds.weights, num_samples = len(train_ds)),
num_workers=4,
drop_last=True)
for epoch in range(2):
for i, data in enumerate(train_dl, 0):
optimizer.zero_grad()
x, y = data
y_hat = model(x)
loss = criterion(y_hat, y)
loss.backward()
optimizer.step()
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ts3l-0.10.tar.gz
(13.1 kB
view hashes)
Built Distribution
ts3l-0.10-py3-none-any.whl
(16.6 kB
view hashes)