Refactoring PyTorch models into sklearn-like API

These details have not been verified by PyPI

Project links

Homepage

Project description

PyTorch2Sklearn

Author GitHub: https://github.com/TGChenZP

Please cite when using this package for research and other machine learning purposes

Introduction
Installation
Model Architectures
Methods
Usage Examples
- Regression Example
- Classification Example
  - MLP Classification Example
  - Transformer Classification Example

Introduction

This package wraps PyTorch MLP and Transformer in an Sklearn style API. It is designed for tabular data supervised learning (classification and regression) with hyperparmeter control in-built for most typical Deep Neural Network architectural design.

Both regression and classification is defined under the same class - specify use case by mode. Remember also to set appropriate loss from torch.nn, and also set appropriate input_dim (number of columns in tabular data) and output_dim (output dimension of regression, or number of classes in classification task)

Installation

pip install PyTorch2Sklearn

Model Architectures

PyTorch2Sklearn.MLP [source]

class PyTorch2Sklearn.MLP.MLP(input_dim, output_dim, hidden_layers, hidden_dim, dropout, mode, batch_size, epochs, loss, TabularDataFactory, TabularDataset, lr=1e-3, random_state=42, grad_clip=False, batchnorm=False, verbose=False, rootpath='./', name='MLP', **kwargs)

Parameters

Parameter & Type	Description
`input_dim` (`int`)	The number of features in the input dataset.
`output_dim` (`int`)	The number of output classes/regression output dimension.
`hidden_layers` (`int`)	The number of hidden layers in the MLP. If set to `0`, will shrink hidden layers at arithmetic differences from input_dim to output_dim
`hidden_dim` (`int`)	The number of neurons in each hidden layer.
`dropout` (`float`)	The dropout rate.
`mode` (`str`)	The mode of the model, either 'Regression' or 'Classification'.
`batch_size` (`int`)	The batch size.
`epochs` (`int`)	The number of epochs.
`lr` (`float`)	The learning rate.
`random_state` (`int`)	The random state. (WARNING: complete reproducibility cannot be guaranteed even if set seed)
`grad_clip` (`bool`, optional, default=`False`)	Whether to use gradient clipping (to 2) to restrict gradients on each parameter.
`batch_norm` (`bool`, optional, default=`False`)	Whether to use batch normalization on each batch of data.
`loss` (`nn.LossFunctions`)	The loss function.
`TabularDataFactory` (`PyTorch2Sklearn.utils.data.TabularDataFactory`)	The tabular data factory that transforms data from input format into the correct format for TabularDataset.
`TabularDataset` (`PyTorch2Sklearn.utils.data.TabularDataset`)	The dataset object that generates batches for stochastic gradient descent.
`verbose` (`bool`, optional, default=`False`)	Whether to print the training progress.
`rootpath` (`str`, optional, default=`./`)	The root path for saving the model.
`name` (`str`, optional, default=`"MLP"`)	The name of the model.

PyTorch2Sklearn.Transformer [source]

class PyTorch2Sklearn.Transformer.Transformer(input_dim, output_dim, num_transformer_layers, num_mlp_layers, hidden_dim, dropout, nhead, mode, batch_size, epochs, loss, TabularDataFactory, TabularDataset, share_embedding_mlp=False, use_cls=False, dim_feedforward=None, lr=1e-3, random_state=42, grad_clip=False, batchnorm=False, verbose=False, rootpath='./', name='Transformer', **kwargs)

Parameters

Parameter	Description
`input_dim` (`int`)	The number of features in the input dataset.
`output_dim` (`int`)	The number of output classes/regression output dimension.
`num_transformer_layers` (`int`)	The number of transformer layers.
`num_mlp_layers` (`int`)	The number of MLP layers.
`hidden_dim` (`int`)	The number of neurons in the hidden layers.
`dropout` (`float`)	The dropout rate.
`nhead` (`int`)	The number of heads in the multiheadattention models.
`mode` (`str`)	The mode of the model, either 'Regression' or 'Classification'.
`batch_size` (`int`)	The batch size.
`epochs` (`int`)	The number of epochs.
`lr` (`float`)	The learning rate.
`random_state` (`int`)	The random state. (WARNING: complete reproducibility cannot be guaranteed even if set seed)
`grad_clip` (`bool`, optional, default=`False`)	Whether to use gradient clipping (to 2) to restrict gradients on each parameter.
`batch_norm` (`bool`, optional, default=`False`)	Whether to use batch normalization on each batch of data.
`loss` (`nn.LossFunctions`)	The loss function.
`TabularDataFactory` (`PyTorch2Sklearn.utils.data.TabularDataFactory`)	The tabular data factory that transforms data from input format into the correct format for TabularDataset.
`TabularDataset` (`PyTorch2Sklearn.utils.data.TabularDataset`)	The dataset object that generates batches for stochastic gradient descent.
`share_embedding_mlp` (`bool`, optional, default=`False`)	Whether to share the embedding layer in the MLP.
`use_cls` (`bool`, optional, default=`False`)	Whether to use the CLS token to feed into the decoder, or concatenate all vectors outputted in the final transformer layer.
`dim_feedforward` (`int`, optional, default=`None`)	The hidden dimension in the feedforward network.
`verbose` (`bool`, optional, default=`False`)	Whether to print the training progress.
`rootpath` (`str`, optional, default=`./`)	The root path for saving the model.
`name` (`str`, optional, default=`"Transformer"`)	The name of the model.

PyTorch2Sklearn.MLP_AGNN [source]

class PyTorch2Sklearn.MLP_AGNN.MLP_AGNN(input_dim, output_dim, num_encoder_layers, num_graph_layers, num_decoder_layers, graph_nhead, hidden_dim, dropout, mode, epochs, loss, DataFactory, graph="J", lr=1e-3, random_state=42, grad_clip=False, batch_norm=False, verbose=False, rootpath='./', name='MLP_AGNN', **kwargs)

Parameters

Parameter & Type	Description
`input_dim` (`int`)	The number of features in the input dataset.
`output_dim` (`int`)	The number of output classes/regression output dimension.
`num_encoder_layers` (`int`)	The number of encoder mlp layers.
`num_decoder_layers` (`int`)	The number of decoder mlp layers.
`num_graph_layers` (`int`)	The number of graph layers.
`graph_nhead` (`int`)	The number of attention heads in graph attention layer.
`hidden_dim` (`int`)	The number of neurons in each hidden layer.
`dropout` (`float`)	The dropout rate.
`mode` (`str`)	The mode of the model, either 'Regression' or 'Classification'.
`epochs` (`int`)	The number of epochs.
`lr` (`float`)	The learning rate.
`random_state` (`int`)	The random state. (WARNING: complete reproducibility cannot be guaranteed even if set seed)
`grad_clip` (`bool`, optional, default=`False`)	Whether to use gradient clipping (to 2) to restrict gradients on each parameter.
`batch_norm` (`bool`, optional, default=`False`)	Whether to use batch normalization on each batch of data.
`loss` (`nn.LossFunctions`)	The loss function.
`GraphDataFactory` (`PyTorch2Sklearn.utils.data.GraphDataFactory`)	The graph data factory that transforms data from input format into the correct format for training.
`graph` (optional, default = `"J"`)	if `"J"`, then every batch will be inferenced with graph = J (1T 1). Also accepts manually defined graph.
`verbose` (`bool`, optional, default=`False`)	Whether to print the training progress.
`rootpath` (`str`, optional, default=`./`)	The root path for saving the model.
`name` (`str`, optional, default=`"MLP_AGNN"`)	The name of the model.

PyTorch2Sklearn.Transformer_AGNN [source]

class PyTorch2Sklearn.Transformer_AGNN.Transformer_AGNN(input_dim, output_dim, num_transformer_layers, num_graph_layers, num_mlp_layers, hidden_dim, dropout, nhead, graph_nhead, mode, epochs, loss, DataFactory, graph="J",  share_embedding_mlp=False, use_cls=False, dim_feedforward=None, lr=1e-3, random_state=42, grad_clip=False, batchnorm=False, verbose=False, rootpath='./', name='Transformer_AGNN', **kwargs)

Parameters

Parameter	Description
`input_dim` (`int`)	The number of features in the input dataset.
`output_dim` (`int`)	The number of output classes/regression output dimension.
`num_transformer_layers` (`int`)	The number of transformer layers.
`num_mlp_layers` (`int`)	The number of MLP layers.
`num_graph_layers` (`int`)	The number of graph layers.
`graph_nhead` (`int`)	The number of attention heads in graph attention layer.
`hidden_dim` (`int`)	The number of neurons in the hidden layers.
`dropout` (`float`)	The dropout rate.
`nhead` (`int`)	The number of heads in the multiheadattention models.
`mode` (`str`)	The mode of the model, either 'Regression' or 'Classification'.
`epochs` (`int`)	The number of epochs.
`lr` (`float`)	The learning rate.
`random_state` (`int`)	The random state. (WARNING: complete reproducibility cannot be guaranteed even if set seed)
`grad_clip` (`bool`, optional, default=`False`)	Whether to use gradient clipping (to 2) to restrict gradients on each parameter.
`batch_norm` (`bool`, optional, default=`False`)	Whether to use batch normalization on each batch of data.
`loss` (`nn.LossFunctions`)	The loss function.
`GraphDataFactory` (`PyTorch2Sklearn.utils.data.GraphDataFactory`)	The graph data factory that transforms data from input format into the correct format for training.
`graph` (optional, default = `"J"`)	if `"J"`, then every batch will be inferenced with graph = J (1T 1). Also accepts manually defined graph.
`share_embedding_mlp` (`bool`, optional, default=`False`)	Whether to share the embedding layer in the MLP.
`use_cls` (`bool`, optional, default=`False`)	Whether to use the CLS token to feed into the decoder, or concatenate all vectors outputted in the final transformer layer.
`dim_feedforward` (`int`, optional, default=`None`)	The hidden dimension in the feedforward network.
`verbose` (`bool`, optional, default=`False`)	Whether to print the training progress.
`rootpath` (`str`, optional, default=`./`)	The root path for saving the model.
`name` (`str`, optional, default=`"Transformer"`)	The name of the model.

Methods [source]

_init__([input_dim, output_dim, ...]): Construct a PyTorch2Sklearn model class

fit(train_x, train_y): fit the model using data

predict(val_x): make inference on features of new data

predict_proba(val_x): make inference (probabilities of each class) for new data [WARNING: only available for classification]

save(mark): save the model parameters

load(mark): load the model parameters

Usage Examples

Regression Example

MLP Regression Example

from sklearn.datasets import make_regression
import pandas as pd
import torch.nn as nn
from PyTorch2Sklearn.MLP import MLP
from PyTorch2Sklearn.utils.data import TabularDataFactory, TabularDataset
from sklearn.metrics import accuracy_score, r2_score

X_reg, y_reg = make_regression(
        n_samples=100, n_features=5, noise=0.1, random_state=42)
X = pd.DataFrame(
    X_reg, columns=[f'feature_{i+1}' for i in range(X_reg.shape[1])])
y = pd.Series(y_reg, name='target')

model = MLP(
        hidden_dim=16,
        hidden_layers=1,
        dropout=0.1,
        batch_size=32,
        epochs=5,
        lr=1e-3,
        batchnorm=False,
        grad_clip=False,
        random_state=42,
        loss=nn.MSELoss(),
        mode='Regression',
        name='MLP',
        verbose=1,
        TabularDataFactory=TabularDataFactory,
        TabularDataset=TabularDataset,
        rootpath='./',
        output_dim=1,
        input_dim=5
    )

model.fit(X, y)

print(r2_score(y, model.predict(X)))

Transformer Regression Example

from sklearn.datasets import make_regression
import pandas as pd
import torch.nn as nn
from PyTorch2Sklearn.MLP import MLP
from PyTorch2Sklearn.utils.data import TabularDataFactory, TabularDataset
from sklearn.metrics import accuracy_score, r2_score

X_reg, y_reg = make_regression(
        n_samples=100, n_features=5, noise=0.1, random_state=42)
X = pd.DataFrame(
    X_reg, columns=[f'feature_{i+1}' for i in range(X_reg.shape[1])])
y = pd.Series(y_reg, name='target')

model = Transformer(
        hidden_dim=16,
        num_transformer_layers=1,
        num_mlp_layers=1,
        dropout=0.1,
        batch_size=32,
        share_embedding_mlp=False,
        nhead=2,
        use_cls=False,
        epochs=5,
        lr=1e-3,
        batchnorm=False,
        grad_clip=False,
        random_state=42,
        loss=nn.MSELoss(),
        mode='Regression',
        name='Transformer',
        verbose=1,
        TabularDataFactory=TabularDataFactory,
        TabularDataset=TabularDataset,
        rootpath='./',
        output_dim=1,
        input_dim=5
    )

model.fit(X, y)

print(r2_score(y, model.predict(X)))

MLP_AGNN Regression Example

from sklearn.datasets import make_classification
import pandas as pd
import torch.nn as nn
from PyTorch2Sklearn.MLP_AGNN import MLP_AGNN
from PyTorch2Sklearn.utils.data import GraphDataFactory
from sklearn.metrics import accuracy_score, r2_score

# Create a regression dataset
X_reg, y_reg = make_regression(
    n_samples=100, n_features=5, noise=0.1, random_state=42
)
X_reg_df = pd.DataFrame(
    X_reg, columns=[f"feature_{i+1}" for i in range(X_reg.shape[1])]
)
y_reg_series = pd.Series(y_reg, name="target")

# must add idx to denote groups of data
reg_graph_df = pd.concat([X_reg_df, y_reg_series], axis=1)
reg_graph_df["idx"] = [i % 10 for i in range(100)]
X = reg_graph_df.drop(columns=["target"])
y = reg_graph_df[["idx", "target"]]

model = MLP_AGNN(
        hidden_dim=16,
        num_encoder_layers=1,
        num_graph_layers=1,
        num_decoder_layers=1,
        graph_nhead=8,
        dropout=0.1,
        epochs=5,
        lr=1e-3,
        batchnorm=False,
        grad_clip=False,
        random_state=42,
        loss=nn.MSELoss(),
        mode="Regression",
        graph="J",
        verbose=1,
        GraphDataFactory=GraphDataFactory,
        rootpath="./",
        output_dim=output_dim,
        input_dim=5,
    )

model.fit(X, y)

print(r2_score(y, model.predict(X)))

Transformer_AGNN Regression Example

from sklearn.datasets import make_classification
import pandas as pd
import torch.nn as nn
from PyTorch2Sklearn.Transformer_AGNN import Transformer_AGNN
from PyTorch2Sklearn.utils.data import GraphDataFactory
from sklearn.metrics import accuracy_score, r2_score

# Create a regression dataset
X_reg, y_reg = make_regression(
    n_samples=100, n_features=5, noise=0.1, random_state=42
)
X_reg_df = pd.DataFrame(
    X_reg, columns=[f"feature_{i+1}" for i in range(X_reg.shape[1])]
)
y_reg_series = pd.Series(y_reg, name="target")

# must add idx to denote groups of data
reg_graph_df = pd.concat([X_reg_df, y_reg_series], axis=1)
reg_graph_df["idx"] = [i % 10 for i in range(100)]
X = reg_graph_df.drop(columns=["target"])
y = reg_graph_df[["idx", "target"]]

model = Transformer_AGNN(
        hidden_dim=16,
        num_transformer_layers=1,
        num_mlp_layers=1,
        num_graph_layers=1,
        graph_nhead=8,
        dropout=0.1,
        share_embedding_mlp=False,
        nhead=8,
        use_cls=False,
        epochs=5,
        lr=1e-3,
        graph="J",
        batchnorm=False,
        grad_clip=False,
        random_state=42,
        loss=nn.MSELoss(),
        mode='Regression',
        verbose=1,
        GraphDataFactory=GraphDataFactory,
        rootpath="./",
        output_dim=2,
        input_dim=5,
    )

model.fit(X, y)

print(r2_score(y, model.predict(X)))

Classification Example

MLP Classification Example

from sklearn.datasets import make_classification
import pandas as pd
import torch.nn as nn
from PyTorch2Sklearn.MLP import MLP
from PyTorch2Sklearn.utils.data import TabularDataFactory, TabularDataset
from sklearn.metrics import accuracy_score, r2_score

X_class_2, y_class_2 = make_classification(
        n_samples=100, n_features=5, n_classes=2, n_clusters_per_class=1, random_state=42)
X = pd.DataFrame(
    X_class_2, columns=[f'feature_{i+1}' for i in range(X_class_2.shape[1])])
y = pd.Series(y_class_2, name='target')

model = MLP(
        hidden_dim=16,
        hidden_layers=1,
        dropout=0.1,
        batch_size=32,
        epochs=5,
        lr=1e-3,
        batchnorm=False,
        grad_clip=False,
        random_state=42,
        loss=nn.CrossEntropyLoss(),
        mode='Classification',
        name='MLP',
        verbose=1,
        TabularDataFactory=TabularDataFactory,
        TabularDataset=TabularDataset,
        rootpath='./',
        output_dim=2,
        input_dim=5
    )

model.fit(X, y)

print(r2_score(y, model.predict(X)))

Transformer Classification Example

from sklearn.datasets import make_classification
import pandas as pd
import torch.nn as nn
from PyTorch2Sklearn.Transformer import Transformer
from PyTorch2Sklearn.utils.data import TabularDataFactory, TabularDataset
from sklearn.metrics import accuracy_score, r2_score

X_class_2, y_class_2 = make_classification(
        n_samples=100, n_features=5, n_classes=2, n_clusters_per_class=1, random_state=42)
X = pd.DataFrame(
    X_class_2, columns=[f'feature_{i+1}' for i in range(X_class_2.shape[1])])
y = pd.Series(y_class_2, name='target')

model = Transformer(
        hidden_dim=16,
        num_transformer_layers=1,
        num_mlp_layers=1,
        dropout=0.1,
        batch_size=32,
        share_embedding_mlp=False,
        nhead=2,
        use_cls=False,
        epochs=5,
        lr=1e-3,
        batchnorm=False,
        grad_clip=False,
        random_state=42,
        loss=nn.CrossEntropyLoss(),
        mode='Classification',
        name='Transformer',
        verbose=1,
        TabularDataFactory=TabularDataFactory,
        TabularDataset=TabularDataset,
        rootpath='./',
        output_dim=2,
        input_dim=5
    )

model.fit(X, y)

print(r2_score(y, model.predict(X)))

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.2.4

Sep 11, 2024

This version

0.2.3

Sep 8, 2024

0.2.2

Sep 7, 2024

0.2.1

Sep 5, 2024

0.2.0

Sep 4, 2024

0.1.3

Aug 30, 2024

0.1.2

Aug 30, 2024

0.1.1

Aug 28, 2024

0.1.0

Aug 25, 2024

0.0.13

Aug 20, 2024

0.0.12

Aug 20, 2024

0.0.11

Aug 20, 2024

0.0.10

Aug 20, 2024

0.0.9

Aug 20, 2024

0.0.8

Aug 20, 2024

0.0.7

Aug 20, 2024

0.0.6

Aug 20, 2024

0.0.5

Aug 18, 2024

0.0.4

Aug 17, 2024

0.0.3

Aug 17, 2024

0.0.2

Aug 17, 2024

0.0.1

Aug 17, 2024

0.0.0

Aug 15, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyTorch2Sklearn-0.2.3.tar.gz (20.7 kB view hashes)

Uploaded Sep 8, 2024 Source

Built Distribution

PyTorch2Sklearn-0.2.3-py3-none-any.whl (19.0 kB view hashes)

Uploaded Sep 8, 2024 Python 3

Hashes for PyTorch2Sklearn-0.2.3.tar.gz

Hashes for PyTorch2Sklearn-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`df57e3f0d59a4ee14b4e30692191711342b5bad86ca257f1f677aff7942eba5c`
MD5	`6737b8d05a987f45e3ec98c386cc4e26`
BLAKE2b-256	`8613d1ea1c70ed2e5d4bcfe9b805b130679fb8fe5f35a02004eb5787020cad75`

Hashes for PyTorch2Sklearn-0.2.3-py3-none-any.whl

Hashes for PyTorch2Sklearn-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b9dc7457b2767989fe20e96b5d30815637b9c72995d6c5f5ea2d3c9e40146f75`
MD5	`0c7e915cf46d10ed032a36597deea35d`
BLAKE2b-256	`2f44c1d658c0ae71a7795877b56dd78577d015877b5c6f882c5148876ec5fe30`

PyTorch2Sklearn 0.2.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PyTorch2Sklearn

Table of Contents

Introduction

Installation

Model Architectures

PyTorch2Sklearn.MLP [source]

Parameters

PyTorch2Sklearn.Transformer [source]

Parameters

PyTorch2Sklearn.MLP_AGNN [source]

Parameters

PyTorch2Sklearn.Transformer_AGNN [source]

Parameters

Methods [source]

Usage Examples

Regression Example

MLP Regression Example

Transformer Regression Example

MLP_AGNN Regression Example

Transformer_AGNN Regression Example

Classification Example

MLP Classification Example

Transformer Classification Example

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution