Skip to main content

PyTorch autoencoder with additional embeddings layer for categorical data.

Project description

The Autoembedder

deploy package pypi python version docs license mypy black isort pre-commit

Introduction

The Autoembedder is an autoencoder with additional embedding layers for the categorical columns. Its usage is flexible, and hyperparameters like the number of layers can be easily adjusted and tuned. Although primarily designed for Panda's dataframes, it can be easily modified to support other data structures.

Let's get started

training.py is where everything begins. The following arguments can / should be set:

Argument Type Required Default value Comment
batch_size int False 32
drop_last int False 1 True/False
pin_memory int False 1 True/False
num_workers int False 0 0 means that the data will be loaded in the main process
use_mps int False 0 Set this to 1 if you want to use the MPS Backend for running on Mac using the M1 GPU. process
model_title str False autoembedder_{datetime}.bin
model_save_path str False
n_save_checkpoints int False
lr float False 0.001
amsgrad int False 0 True/False
epochs int True
layer_bias int False 1 True/False
weight_decay float False 0
l1_lambda float False 0
xavier_init int False 0 True/False
tensorboard_log_path str False
train_input_path str True
test_input_path str True
activation_for_code_layer int False 0 True/False, should the layer have an activation
activation_for_final_decoder_layer int False 0 True/False, should the final decoder layer have an activation
hidden_layer_representation str True Contains a string representation of a list of list of integers which represents the hidden layer structure. E.g.: "[[64, 32], [32, 16], [16, 8]]" activation
cat_columns str False "[]" Contains a string representation of a list of list of categorical columns (strings). The columns which use the same encoder should be together in a list. E.g.: "[['a', 'b'], ['c']]".

So, something like this would do it:

$ python3 training.py --epochs 20 \
--train_input_path "path/to/your/train_data" \
--test_input_path "path/to/your/test_data" \
--hidden_layer_representation "[[12, 6], [6, 3]]"

Why additional embedding layers?

The additional embedding layers automatically embed all columns with the Pandas category data type. If categorical columns have another data type, they will not be embedded and will be handled like the continuous columns. Simply encoding the categorical values (e.g., with the usage of a label encoder) decreases the quality of the outcome.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autoembedder-0.1.1.tar.gz (13.7 kB view details)

Uploaded Source

Built Distribution

autoembedder-0.1.1-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file autoembedder-0.1.1.tar.gz.

File metadata

  • Download URL: autoembedder-0.1.1.tar.gz
  • Upload date:
  • Size: 13.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.9.15 Linux/5.15.0-1022-azure

File hashes

Hashes for autoembedder-0.1.1.tar.gz
Algorithm Hash digest
SHA256 808475d3f5b7dbf3d2706674ff74e5520192e251d61f632ba120f56ae4ae9d19
MD5 d6f731dbf51f9adb7fb77a9ba37528de
BLAKE2b-256 23ec396b1586d795523ea54b6ce67e71b857f042a090e2bbf1856dc03ea17b3e

See more details on using hashes here.

File details

Details for the file autoembedder-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: autoembedder-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 15.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.9.15 Linux/5.15.0-1022-azure

File hashes

Hashes for autoembedder-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a1b56a183bf66c893f976cfdd19b1dc6f27f6c8830a4cc35e90cc6c04c679dd0
MD5 0369245f3d2f099ec854d20f9a83d21c
BLAKE2b-256 4cfb8cba2af6832bb65dd5991a22be3cbd66ef3a66934ea815c478facf635ad6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page