Investigate relationship on variables pairs in tabular datasets, based on Correlation, PPS and MIC scores.
Project description
Readme
The package analyzes the data by calculating Correlation levels, Power Predictive Score and Maximal Information Coefficient. That gives insights on how variables relate to each other, and is useful in the Exploratory Data Analysis workflow.
Logic
- It calculates correlation (Spearman and Person) using with all columns pairs, then filters correlation levels based on pre-defined threshold. As a result, only relevant column pairs for Corrlaion remain.
- It calculates PPS using all columns pairs, then filter PPS levels based on pre-defined threshold. As a result, only relevant column pairs for PPS remain.
- The list of relevant pairs for both Correlation and PPS are merged, and used as reference to compute MIC. Computing MIC is expensive, and depending on your resources (time, processing etc), it would worth to compute MIC only in the most promising columns pairs. However, the most complete analysis is made considerin all possible columns pairs combinations, although that is more expensive
Outputs
- it reports the variables pairs with most interesting relationships
- as well as a scatter plot for each pair
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cpm-analytics-0.0.201.tar.gz
(8.3 kB
view hashes)
Built Distribution
Close
Hashes for cpm_analytics-0.0.201-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bae615fa0bbe368f7b854952fcff27e0b12a5eabd68e3611dc4fffa83e6cead7 |
|
MD5 | 9ad2d2bee11f4d44d51ef4c031e49d07 |
|
BLAKE2b-256 | 7b2d00b345ba761e32262ea4ff97d2cb4fd0449bdeebf80cefbce906cc04e6d0 |