Skip to main content

A class to generate a family of categorical encoders using network analysis

Project description

catencfamily

The module provides a way to encode categorical features in multiple but related ways using network analysis. Together, the family of multiple encodings serve as a numerical vector for every level of a categorical feature. The class transforms a group of categorical features into corresponding numerical features which can then be used either in unsupervised learning or in predictive analytics. To assist in unsupervised learning, it has methods to save a categorical feature as network graphs and plot them. The class has methods to extract unit vectors for every level of a categorical feature to help, for example, in understanding relationshsips between various levels. Extracted numerical vectors can directly be used in plotting, for example, in tensorflow's Embedding Projector. Class provides methods to get categories encoded for large datasets; one can, for example, take a sample of data at a time, have categories encoded, then take another sample and have categories similarly encoded. After a number of iterations, take either a mean or median to get final category-wise vectors. As the encodings are calculated using a group of network analysis operations, the family of encodings is extensible. The class provides one way, but a limited one to extend it.

Installation

pip install catencfamily

Requirements


python >= 3.7
pandas
numpy
networkx
cdlib
scikit-learn
matplotlib
pathlib

License

_MIT License. Any contribution is welcome!

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

catencfamily-0.1.19.tar.gz (42.1 kB view hashes)

Uploaded Source

Built Distribution

catencfamily-0.1.19-py3-none-any.whl (44.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page