Skip to main content

Graphical Hypergeometric Networks

Project description

HNET - Graphical Hypergeometric Networks

Python PyPI Version License Github Forks GitHub Open Issues Project Status Downloads Downloads Sphinx arXiv Medium Open In Colab DOI

Star this repo if you like it! ⭐️

Blog

Read more details and usage about HNet in this blog!

Summary

HNet stands for graphical Hypergeometric Networks, which is a method where associations across variables are tested for significance by statistical inference. The aim is to determine a network with significant associations that can shed light on the complex relationships across variables. Input datasets can range from generic dataframes to nested data structures with lists, missing values and enumerations.

Real-world data often contain measurements with both continuous and discrete values. Despite the availability of many libraries, data sets with mixed data types require intensive pre-processing steps, and it remains a challenge to describe the relationships between variables. The data understanding phase is crucial to the data-mining process, however, without making any assumptions on the data, the search space is super-exponential in the number of variables. A thorough data understanding phase is therefore not common practice.

Methods

We propose graphical hypergeometric networks (HNet), a method to test associations across variables for significance using statistical inference. The aim is to determine a network using only the significant associations in order to shed light on the complex relationships across variables. HNet processes raw unstructured data sets and outputs a network that consists of (partially) directed or undirected edges between the nodes (i.e., variables). To evaluate the accuracy of HNet, we used well known data sets and generated data sets with known ground truth. In addition, the performance of HNet is compared to Bayesian association learning.

Results

We demonstrate that HNet showed high accuracy and performance in the detection of node links. In the case of the Alarm data set we can demonstrate on average an MCC score of 0.33 + 0.0002 (P<1x10-6), whereas Bayesian association learning resulted in an average MCC score of 0.52 + 0.006 (P<1x10-11), and randomly assigning edges resulted in a MCC score of 0.004 + 0.0003 (P=0.49).

Conclusions

HNet overcomes processes raw unstructured data sets, it allows analysis of mixed data types, it easily scales up in number of variables, and allows detailed examination of the detected associations.

Documentation

Method overview

Installation

  • Install hnet from PyPI (recommended). Hnet is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows. It is distributed under the Apache 2.0 license.
pip install -U hnet
  • Simple example for the Titanic data set
# Initialize hnet with default settings
from hnet import hnet
# Load example dataset
df = hnet.import_example('titanic')
# Print to screen
print(df)
#      PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked
# 0              1         0       3  ...   7.2500   NaN         S
# 1              2         1       1  ...  71.2833   C85         C
# 2              3         1       3  ...   7.9250   NaN         S
# 3              4         1       1  ...  53.1000  C123         S
# 4              5         0       3  ...   8.0500   NaN         S
# ..           ...       ...     ...  ...      ...   ...       ...
# 886          887         0       2  ...  13.0000   NaN         S
# 887          888         1       1  ...  30.0000   B42         S
# 888          889         0       3  ...  23.4500   NaN         S
# 889          890         1       1  ...  30.0000  C148         C
# 890          891         0       3  ...   7.7500   NaN         Q

Association learning on the titanic dataset.

from hnet import hnet
hn = hnet()
results = hn.association_learning(df)

# Plot static graph
G_static = hn.plot()

# Plot heatmap
P_heatmap = hn.heatmap(cluster=True)

# Plot dynamic graph
hn.d3graph()

# Plot dynamic graph
hn.d3heatmap()

Summarize results.

Networks can become giant hairballs and heatmaps unreadable. You may want to see the general associations between the categories, instead of the label-associations. With the summarize functionality, the results will be summarized towards categories.

# Import
from hnet import hnet

# Load example dataset
df = hnet.import_example('titanic')

# Initialize
hn = hnet()

# Association learning
results = hn.association_learning(df)

# Plot heatmap
hn.heatmap(summarize=True, cluster=True)
hn.d3heatmap(summarize=True)

# Plot static graph
hn.plot(summarize=True)
hn.d3graph(summarize=True, charge=1000)

Feature importance

# Plot feature importance
hn.plot_feat_importance(marker_size=50)

Performance

Citation

Please cite hnet in your publications if this is useful for your research! You can find it in the right panel.

Maintainer

Erdogan Taskesen, github: [erdogant](https://github.com/erdogant)
Contributions are welcome.

Star it if you like it!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hnet-1.1.11.tar.gz (48.3 kB view details)

Uploaded Source

Built Distribution

hnet-1.1.11-py3-none-any.whl (49.3 kB view details)

Uploaded Python 3

File details

Details for the file hnet-1.1.11.tar.gz.

File metadata

  • Download URL: hnet-1.1.11.tar.gz
  • Upload date:
  • Size: 48.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.12

File hashes

Hashes for hnet-1.1.11.tar.gz
Algorithm Hash digest
SHA256 7bbe3939dc84ab37a7c6fab624b006e52a01b589b686a5aa91a474374bb292f3
MD5 13fc93eaddfffdb2e6d10af675366baa
BLAKE2b-256 6295d64d2e68acc122c93c7bde421c7ba454543af7e67c278266e4fde3cbf910

See more details on using hashes here.

Provenance

File details

Details for the file hnet-1.1.11-py3-none-any.whl.

File metadata

  • Download URL: hnet-1.1.11-py3-none-any.whl
  • Upload date:
  • Size: 49.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.12

File hashes

Hashes for hnet-1.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 fcefad850b4209e5a0ac800907e38b6cf6c64535122f5b74095e83f6c084a1b8
MD5 deef23c47739f0bd0d879e41c07d67a8
BLAKE2b-256 25f292e8a7148de48128a95afd923eb5bc5dcec5231fcc2bf20abb27593cf489

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page