A class to generate a family of categorical encoders using network analysis
Project description
catencfamily
The module provides a way to encode categorical features in multiple but related ways using network analysis. Together, the family of multiple encodings serve as a numerical vector for every level of a categorical feature. The class transforms a group of categorical features into corresponding numerical features which can then be used either in unsupervised learning or in predictive analytics. To assist in unsupervised learning, it has methods to save a categorical feature as network graphs and plot them. The class has methods to extract unit vectors for every level of a categorical feature to help, for example, in understanding relationshsips between various levels. Extracted numerical vectors can directly be used in plotting, for example, in tensorflow's Embedding Projector. Class provides methods to get categories encoded for large datasets; one can, for example, take a sample of data at a time, have categories encoded, then take another sample and have categories similarly encoded. After a number of iterations, take either a mean or median to get final category-wise vectors. As the encodings are calculated using a group of network analysis operations, the family of encodings is extensible. The class provides one way, but a limited one to extend it.
Installation
pip install catencfamily
Requirements
python >= 3.7
pandas
numpy
networkx
cdlib
scikit-learn
matplotlib
pathlib
License
_MIT License. Any contribution is welcome!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for catencfamily-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d0c1dd4759e5d416e2f01579b8d072661b558efe447e811e050c9abcd8c68642 |
|
MD5 | ecab65abf704a9c51f45c77056f702db |
|
BLAKE2b-256 | 64202cddb03de7cf533b6c3367cd87f1b5deaaaea59be497f83148d60d001a77 |