Categorical Embedder is a python package that let's you convert your categorical variables into numeric via Neural Networks
Project description
Categorical Embedder
Categorical Embedder is a python package that let's you convert your categorical variables into numeric via Neural Networks
Installation
pip install categorical_embedder
Example
import categorical_embedder as ce
from sklearn.model_selection import train_test_split
df = pd.read_csv('HR_Attrition_Data.csv')
X = df.drop(['employee_id', 'is_promoted'], axis=1)
y = df['is_promoted']
embedding_info = ce.get_embedding_info(X)
X_encoded,encoders = ce.get_label_encoded_data(X)
X_train, X_test, y_train, y_test = train_test_split(X_encoded,y)
embeddings = ce.get_embeddings(X_train, y_train, categorical_embedding_info=embedding_info,
is_classification=True, epochs=100,batch_size=256)
A more detailed Jupyter Notebook can be found here
What's inside Categorical Embedder ?
ce.get_embedding_info(data,categorical_variables=None)
: This function identifies all categorical variables in the data, determines its embedding size. Embedding size of the categorical variables are determined by minimum of 50 or half of the no. of its unique values i.e. embedding size of a column = Min(50, # unique values in that column) One can pass explicit list of categorical variables incategorical_variables
parameter. IfNone
, this function automatically takes all the variables with data typeobject
ce.get_label_encoded_data(data, categorical_variables=None)
: This function label encodes (integer encoding) all the categorical variables using sklearn.preprocessing.LabelEncoder and returns a label encoded dataframe for training. Keras/tensorflow or any other deep learning library would expect the data to be in this format.ce.get_embeddings(X_train, y_train, categorical_embedding_info=embedding_info, is_classification=True, epochs=100,batch_size=256)
: This function trains a shallow neural networks and returns embeddings of categorical variables. Under the hood, It is a 2 layer neural network architecture with 1000 and 500 neurons with 'ReLU' activation. It takes 4 required inputs -X_train
,y_train
,categorical_embedding_info
:output of get_embedding_info function andis_classification
:True
for classification tasks;False
for regression tasks.
For classification: loss = 'binary_crossentropy'; metrics = 'accuracy'
and for regression: loss = 'mean_squared_error'; metrics = 'r2'
Dependencies
pandas
scikit-learn
tensorflow
keras
tqdm
keras-tqdm
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file categorical_embedder-0.1.tar.gz
.
File metadata
- Download URL: categorical_embedder-0.1.tar.gz
- Upload date:
- Size: 4.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cc0a152f6e7ff381ec06f5ab5a7cf27116571b51ac092c1112099a7d8b8e83d3 |
|
MD5 | 10c28096a88a09a6280b581a5c99a554 |
|
BLAKE2b-256 | 66c949835ed4c83c0310b4d86bf866596f79de535a413e479130f1724efb9e92 |
File details
Details for the file categorical_embedder-0.1-py3-none-any.whl
.
File metadata
- Download URL: categorical_embedder-0.1-py3-none-any.whl
- Upload date:
- Size: 5.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fd87fee2b0484e0042825bd71c6b4e6820ad65e6c264a7a21ac431a3e1630ab7 |
|
MD5 | 1f6d170db06d1f4da26758448290ed5d |
|
BLAKE2b-256 | 4455e114f63ad47253ac04b0db012b3efc9183762e23bc5c40187c040b8c99d9 |