Skip to main content

A class to generate a family of categorical encoders using network analysis

Project description

catencfamily

The module provides a way to encode categorical features in multiple but related ways using network analysis. Together, the family of multiple encodings serve as a numerical vector for every level of a categorical feature. The class transforms a group of categorical features into corresponding numerical features which can then be used either in unsupervised learning or in predictive analytics. To assist in unsupervised learning, it has methods to save a categorical feature as network graphs and plot them. The class has methods to extract unit vectors for every level of a categorical feature to help, for example, in understanding relationshsips between various levels. Extracted numerical vectors can directly be used in plotting, for example, in tensorflow's Embedding Projector. Class provides methods to get categories encoded for large datasets; one can, for example, take a sample of data at a time, have categories encoded, then take another sample and have categories similarly encoded. After a number of iterations, take either a mean or median to get final category-wise vectors. As the encodings are calculated using a group of network analysis operations, the family of encodings is extensible. The class provides one way, but a limited one to extend it.

Installation

pip install catencfamily

Requirements


python >= 3.7
pandas
numpy
networkx
cdlib
scikit-learn
matplotlib
pathlib

License

_MIT License. Any contribution is welcome!

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

catencfamily-0.1.19.tar.gz (42.1 kB view details)

Uploaded Source

Built Distribution

catencfamily-0.1.19-py3-none-any.whl (44.2 kB view details)

Uploaded Python 3

File details

Details for the file catencfamily-0.1.19.tar.gz.

File metadata

  • Download URL: catencfamily-0.1.19.tar.gz
  • Upload date:
  • Size: 42.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for catencfamily-0.1.19.tar.gz
Algorithm Hash digest
SHA256 4a1232a52d1690bf2b034642138bc76bacf9ddf4c5f98a29c1d38d33b3edd84d
MD5 f14d10feb45f79081efa461020d70709
BLAKE2b-256 7500a6d862d090b32b6cb195df7ab01d76c701c8b53830c863988346c0857798

See more details on using hashes here.

File details

Details for the file catencfamily-0.1.19-py3-none-any.whl.

File metadata

File hashes

Hashes for catencfamily-0.1.19-py3-none-any.whl
Algorithm Hash digest
SHA256 8fbc385496086bf01710d27bf014bacbcf2a804c83e4d6d8ca4b441b30434295
MD5 7a75ff254d0ff40f7a3835fd0a04d8eb
BLAKE2b-256 bbecfa0d3bb769f4933f1b28fae2cbb51b25f95a7ca3c6346411474eb4eb76b6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page