Skip to main content

Graphical Hypergeometric Networks

Project description

Python Pypi Docs LOC Downloads Downloads License Forks Issues Project Status DOI Medium Colab Donate

hnet is a Python package for probability density fitting of univariate distributions for random variables. The hnet library can determine the best fit for over 90 theoretical distributions. The goodness-of-fit test is used to score for the best fit and after finding the best-fitted theoretical distribution, the loc, scale, and arg parameters are returned. It can be used for parametric, non-parametric, and discrete distributions. ⭐️Star it if you like it⭐️

Key Features

Feature Description Docs Medium Gumroad+Podcast
Association Learning Discover significant associations across variables using statistical inference. Link Link Link
Mixed Data Handling Works with continuous, discrete, categorical, and nested variables without heavy preprocessing. Link - -
Summarization Summarize complex networks into interpretable structures. Link - -
Feature Importance Rank variables by importance within associations. Link - -
Interactive Visualizations Explore results with dynamic dashboards and d3-based visualizations. Dashboard - Titanic Example
Performance Evaluation Compare accuracy with Bayesian association learning and benchmarks. Link - -
Interactive Dashboard No data leaves your machine. All computations are performed locally. Link - -

Resources and Links


Background

  • HNet stands for graphical Hypergeometric Networks, which is a method where associations across variables are tested for significance by statistical inference. The aim is to determine a network with significant associations that can shed light on the complex relationships across variables. Input datasets can range from generic dataframes to nested data structures with lists, missing values and enumerations.

  • Real-world data often contain measurements with both continuous and discrete values. Despite the availability of many libraries, data sets with mixed data types require intensive pre-processing steps, and it remains a challenge to describe the relationships between variables. The data understanding phase is crucial to the data-mining process, however, without making any assumptions on the data, the search space is super-exponential in the number of variables. A thorough data understanding phase is therefore not common practice.

  • Graphical hypergeometric networks (HNet), a method to test associations across variables for significance using statistical inference. The aim is to determine a network using only the significant associations in order to shed light on the complex relationships across variables. HNet processes raw unstructured data sets and outputs a network that consists of (partially) directed or undirected edges between the nodes (i.e., variables). To evaluate the accuracy of HNet, we used well known data sets and generated data sets with known ground truth. In addition, the performance of HNet is compared to Bayesian association learning.

  • HNet showed high accuracy and performance in the detection of node links. In the case of the Alarm data set we can demonstrate on average an MCC score of 0.33 + 0.0002 (P<1x10-6), whereas Bayesian association learning resulted in an average MCC score of 0.52 + 0.006 (P<1x10-11), and randomly assigning edges resulted in a MCC score of 0.004 + 0.0003 (P=0.49). HNet overcomes processes raw unstructured data sets, it allows analysis of mixed data types, it easily scales up in number of variables, and allows detailed examination of the detected associations.


Installation

Install hnet from PyPI
pip install hnet
Install from Github source
pip install git+https://github.com/erdogant/hnet
Imort Library
import hnet
print(hnet.__version__)

# Import library
from hnet import hnet

Installation

  • Install hnet from PyPI (recommended).
pip install -U hnet

Examples

  • Simple example for the Titanic data set
# Initialize hnet with default settings
from hnet import hnet
# Load example dataset
df = hnet.import_example('titanic')
# Print to screen
print(df)
#      PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked
# 0              1         0       3  ...   7.2500   NaN         S
# 1              2         1       1  ...  71.2833   C85         C
# 2              3         1       3  ...   7.9250   NaN         S
# 3              4         1       1  ...  53.1000  C123         S
# 4              5         0       3  ...   8.0500   NaN         S
# ..           ...       ...     ...  ...      ...   ...       ...
# 886          887         0       2  ...  13.0000   NaN         S
# 887          888         1       1  ...  30.0000   B42         S
# 888          889         0       3  ...  23.4500   NaN         S
# 889          890         1       1  ...  30.0000  C148         C
# 890          891         0       3  ...   7.7500   NaN         Q

Play with the interactive Titanic results.

Example: Learn association learning on the titanic dataset

Example: Summarize results

Networks can become giant hairballs and heatmaps unreadable. You may want to see the general associations between the categories, instead of the label-associations. With the summarize functionality, the results will be summarized towards categories.

Example: Feature importance

Performance


Contributors

Maintainer

  • Erdogan Taskesen, github: erdogant
  • Contributions are welcome.
  • This library is free. But powered by caffeine! Like it? Chip in what it's worth, and keep me creating new functionalities!🙂

Buy me a coffee

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hnet-1.3.2.tar.gz (53.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hnet-1.3.2-py3-none-any.whl (53.6 kB view details)

Uploaded Python 3

File details

Details for the file hnet-1.3.2.tar.gz.

File metadata

  • Download URL: hnet-1.3.2.tar.gz
  • Upload date:
  • Size: 53.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for hnet-1.3.2.tar.gz
Algorithm Hash digest
SHA256 24384a11768451c8c87ddec81fea2fbcc4916e246efa516f5f0b505c3dca6e89
MD5 6b9f73e0ad0303e32df1e0fd2315fdf5
BLAKE2b-256 3616a48f67e342e1420b606bf068027d142cd6e9c7a7c1b78978eccd4b861b3c

See more details on using hashes here.

File details

Details for the file hnet-1.3.2-py3-none-any.whl.

File metadata

  • Download URL: hnet-1.3.2-py3-none-any.whl
  • Upload date:
  • Size: 53.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for hnet-1.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a2cd1b1dfe26df9b51ba9790f60b5aa39ca37ea65f9b50ad19dacc2a989bf621
MD5 290f841f3182e9fa6be4d2d07f0f2455
BLAKE2b-256 0250d7f25615e3e8be9f0d58146d6f44d28c1cbf862a2310f1df8c8b325c042d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page