Graphical Hypergeometric Networks
Project description
HNET - Graphical Hypergeometric Networks
Star it if you like it!
HNet stands for graphical Hypergeometric Networks, which is a method where associations across variables are tested for significance by statistical inference. The aim is to determine a network with significant associations that can shed light on the complex relationships across variables. Input datasets can range from generic dataframes to nested data structures with lists, missing values and enumerations.
Real-world data often contain measurements with both continuous and discrete values. Despite the availability of many libraries, data sets with mixed data types require intensive pre-processing steps, and it remains a challenge to describe the relationships between variables. The data understanding phase is crucial to the data-mining process, however, without making any assumptions on the data, the search space is super-exponential in the number of variables. A thorough data understanding phase is therefore not common practice.
Methods
We propose graphical hypergeometric networks (HNet
), a method to test associations across variables for significance using statistical inference. The aim is to determine a network using only the significant associations in order to shed light on the complex relationships across variables. HNet processes raw unstructured data sets and outputs a network that consists of (partially) directed or undirected edges between the nodes (i.e., variables). To evaluate the accuracy of HNet, we used well known data sets and generated data sets with known ground truth. In addition, the performance of HNet is compared to Bayesian association learning.
Results
We demonstrate that HNet showed high accuracy and performance in the detection of node links. In the case of the Alarm data set we can demonstrate on average an MCC score of 0.33 + 0.0002 (P<1x10-6), whereas Bayesian association learning resulted in an average MCC score of 0.52 + 0.006 (P<1x10-11), and randomly assigning edges resulted in a MCC score of 0.004 + 0.0003 (P=0.49).
Conclusions
HNet overcomes processes raw unstructured data sets, it allows analysis of mixed data types, it easily scales up in number of variables, and allows detailed examination of the detected associations.
Documentation
- API Documentation: https://erdogant.github.io/hnet/
- Article: https://arxiv.org/abs/2005.04679
Method overview
Contents
Installation
- Install hnet from PyPI (recommended). Hnet is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows. It is distributed under the Apache 2.0 license.
pip install hnet
- Simple example for the Titanic data set
# Load library
from hnet import hnet
# Initialize hnet with default settings
from hnet import hnet
# Load example dataset
df = hnet.import_example('titanic')
# Print to screen
print(df)
# PassengerId Survived Pclass ... Fare Cabin Embarked
# 0 1 0 3 ... 7.2500 NaN S
# 1 2 1 1 ... 71.2833 C85 C
# 2 3 1 3 ... 7.9250 NaN S
# 3 4 1 1 ... 53.1000 C123 S
# 4 5 0 3 ... 8.0500 NaN S
# .. ... ... ... ... ... ... ...
# 886 887 0 2 ... 13.0000 NaN S
# 887 888 1 1 ... 30.0000 B42 S
# 888 889 0 3 ... 23.4500 NaN S
# 889 890 1 1 ... 30.0000 C148 C
# 890 891 0 3 ... 7.7500 NaN Q
Association learning on the titanic dataset
hn = hnet()
out = hn.association_learning(df)
# Plot static graph
G_static = hn.plot()
# Plot heatmap
P_heatmap = hn.heatmap(cluster=True)
# Plot dynamic graph
G_dynamic = hn.d3graph()
Performance
Citation
Please cite hnet
in your publications if this is useful for your research.
Here is the BibTeX entry:
@misc{taskesen2020hnet,
title={HNet: Graphical Hypergeometric Networks},
author={Erdogan Taskesen},
year={2020},
eprint={2005.04679},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Maintainer
Erdogan Taskesen, github: [erdogant](https://github.com/erdogant)
Contributions are welcome.
Star it if you like it!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hnet-1.1.1.tar.gz
.
File metadata
- Download URL: hnet-1.1.1.tar.gz
- Upload date:
- Size: 39.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ebfa7eb479b1ae843e82abddae39f8d0d0c5ed28dfa98473e58886e18e08f97a |
|
MD5 | 3a48bf2df33d358d29aa03100213fa38 |
|
BLAKE2b-256 | 073e4f3208eb64911ab2ead288d0bd8be9416c80d3a9dffc4152ca347dc146cd |
Provenance
File details
Details for the file hnet-1.1.1-py3-none-any.whl
.
File metadata
- Download URL: hnet-1.1.1-py3-none-any.whl
- Upload date:
- Size: 47.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 453ca8bf282ba3e1def9f174118e16c30021bebf5a2024c035af1c61d69ccaa2 |
|
MD5 | f51f7c78bb11f4316003ce960d90cbbc |
|
BLAKE2b-256 | 3e172df9218685866fecdf4a4681617b2f2469733ccd68be21c0c2917151e54d |