Skip to main content

Categorical Embedder is a python package that let's you convert your categorical variables into numeric via Neural Networks

Project description

Categorical Embedder

Categorical Embedder is a python package that let's you convert your categorical variables into numeric via Neural Networks

Installation

pip install categorical_embedder

Example

import categorical_embedder as ce
from sklearn.model_selection import train_test_split

df = pd.read_csv('HR_Attrition_Data.csv')
X = df.drop(['employee_id', 'is_promoted'], axis=1)
y = df['is_promoted']

embedding_info = ce.get_embedding_info(X)
X_encoded,encoders = ce.get_label_encoded_data(X)

X_train, X_test, y_train, y_test = train_test_split(X_encoded,y)

embeddings = ce.get_embeddings(X_train, y_train, categorical_embedding_info=embedding_info, 
                            is_classification=True, epochs=100,batch_size=256)

A more detailed Jupyter Notebook can be found here

What's inside Categorical Embedder ?

  • ce.get_embedding_info(data,categorical_variables=None): This function identifies all categorical variables in the data, determines its embedding size. Embedding size of the categorical variables are determined by minimum of 50 or half of the no. of its unique values i.e. embedding size of a column = Min(50, # unique values in that column) One can pass explicit list of categorical variables in categorical_variables parameter. If None, this function automatically takes all the variables with data type object
  • ce.get_label_encoded_data(data, categorical_variables=None): This function label encodes (integer encoding) all the categorical variables using sklearn.preprocessing.LabelEncoder and returns a label encoded dataframe for training. Keras/tensorflow or any other deep learning library would expect the data to be in this format.
  • ce.get_embeddings(X_train, y_train, categorical_embedding_info=embedding_info, is_classification=True, epochs=100,batch_size=256): This function trains a shallow neural networks and returns embeddings of categorical variables. Under the hood, It is a 2 layer neural network architecture with 1000 and 500 neurons with 'ReLU' activation. It takes 4 required inputs - X_train, y_train, categorical_embedding_info:output of get_embedding_info function and is_classification: True for classification tasks; False for regression tasks.

For classification: loss = 'binary_crossentropy'; metrics = 'accuracy' and for regression: loss = 'mean_squared_error'; metrics = 'r2'

Dependencies

pandas
scikit-learn
tensorflow
keras
tqdm
keras-tqdm

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

categorical_embedder-0.1.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

categorical_embedder-0.1-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file categorical_embedder-0.1.tar.gz.

File metadata

  • Download URL: categorical_embedder-0.1.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for categorical_embedder-0.1.tar.gz
Algorithm Hash digest
SHA256 cc0a152f6e7ff381ec06f5ab5a7cf27116571b51ac092c1112099a7d8b8e83d3
MD5 10c28096a88a09a6280b581a5c99a554
BLAKE2b-256 66c949835ed4c83c0310b4d86bf866596f79de535a413e479130f1724efb9e92

See more details on using hashes here.

File details

Details for the file categorical_embedder-0.1-py3-none-any.whl.

File metadata

  • Download URL: categorical_embedder-0.1-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for categorical_embedder-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fd87fee2b0484e0042825bd71c6b4e6820ad65e6c264a7a21ac431a3e1630ab7
MD5 1f6d170db06d1f4da26758448290ed5d
BLAKE2b-256 4455e114f63ad47253ac04b0db012b3efc9183762e23bc5c40187c040b8c99d9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page