Skip to main content

Investigate relationship on variables pairs in tabular datasets, based on Correlation, PPS and MIC scores.

Project description

CPM Analytics

ADD EMOTICONS

cpm-analytics goal is to provide insights on major relationships in a tabular dataset. The package analyzes the data by computing the following scores: Correlation levels, Power Predictive Score (PPS) and Maximal Information Coefficient (MIC). Next, for each variable's pair, it plots the scatterplot, boxplot or bar plot. Finally, it shows a heatmap for each score and each variable pair. That gives insights on how variables relate to each other, and is useful in the Exploratory Data Analysis workflow. From there on, you can decide to which additional data visualizations to look after,

By the way, CPM is a short for Correlation, PPS and MIC.

The overall logic is Logic

  • It calculates correlation (Spearman and Person) using with all columns pairs, then filters correlation levels based on pre-defined threshold. As a result, only relevant column pairs for Corrlaion remain.
  • It calculates PPS using all columns pairs, then filter PPS levels based on pre-defined threshold. As a result, only relevant column pairs for PPS remain.
  • The list of relevant pairs for both Correlation and PPS are merged, and used as reference to compute MIC. Computing MIC is expensive, and depending on your resources (time, processing etc), it would worth to compute MIC only in the most promising columns pairs. However, the most complete analysis is made considerin all possible columns pairs combinations, although that is more expensive

The outputs

  • it reports the variables pairs with most interesting relationships
  • as well as a scatter plot for each pair

Please visit the documentation

Example usage

from cpm_analytics import CorrPpsMicAnalytics analytics = CorrPpsMicAnalytics() #corr_threshold=0.4, pps_threshold=0.2) analytics.compute_score(df_raw) analytics.summary_report() analytics.plot_relationships() analytics.plot_heatmaps(figsize=(20,5))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cpm-analytics-0.0.204.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cpm_analytics-0.0.204-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file cpm-analytics-0.0.204.tar.gz.

File metadata

  • Download URL: cpm-analytics-0.0.204.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for cpm-analytics-0.0.204.tar.gz
Algorithm Hash digest
SHA256 ff0d8e0cf9ce9cc7b55cbb28850328509cf5733a4189ff8c44ad870ee934c465
MD5 eb18c17bff32beb90c97c44c1dec9471
BLAKE2b-256 c8f1164aa0313d5500cf178945ef01f9ca00f4505ff523bfb773e9e25e4b0841

See more details on using hashes here.

File details

Details for the file cpm_analytics-0.0.204-py3-none-any.whl.

File metadata

File hashes

Hashes for cpm_analytics-0.0.204-py3-none-any.whl
Algorithm Hash digest
SHA256 43443b8ddd15cbbdcdc16fc8b0ad2374efe835643876f0d4a485db2896c98c26
MD5 0464ebecbedc502145d8dc8c8e83f76f
BLAKE2b-256 51067da75ac88b3800e56c2ce7aeab39aeafc39b0e5869e314c2e61c611a72f3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page