Skip to main content

Weighted correlation in Python. Pandas based implementation of weighted Pearson and Spearman correlations.

Project description

WeightedCorr

Weighted correlation in Python. Pandas based implementation of weighted Pearson and Spearman correlations.

v2.1 20-03-2021

Fixed Issue #1

V2 Update 21-07-2020

Switched from a pandas backend to a numpy/scipy backend. Usage remains the same, but performance for Spearman correlations is significantly improved. See table below.

N samples Pearson_v1 Pearson_v2 Spearman_v1 Spearman_v2
10 3.55 ms ± 64.1 µs 1.59 ms ± 9.32 µs 14 ms ± 131 µs 1.78 ms ± 7.55 µs
100 6.69 ms ± 89 µs 4.94 ms ± 79.9 µs 21.4 ms ± 979 µs 5.08 ms ± 144 µs
1000 39.1 ms ± 426 µs 36.7 ms ± 529 µs 93.7 ms ± 1.03 ms 37.2 ms ± 433 µs
10000 350 ms ± 4.56 ms 343 ms ± 5.41 ms 746 ms ± 5.29 ms 350 ms ± 7.42 ms
100000 3.48 s ± 11.9 ms 3.48 s ± 6.44 ms 7.44 s ± 20.1 ms 3.52 s ± 9.27 ms

Install

pip install wcorr

Usage

This class can be used in a few different ways depending on your needs. The data should be passed to the initialization of the class. Then calling the class will produce the result with desired method (pearson is the default). Note that the method should be passed to the call, not the initialization. The examples below will result in pearson, pearson, and spearman correlations.

from wcorr import WeightedCorr
  1. You can supply a pandas DataFrame with x, y, and w columns (columns should be in that order). The output will be a single floating point value.
WeightedCorr(xyw=my_data[['x', 'y', 'w']])(method='pearson')
  1. You can supply x, y, and w pandas Series separately. The output will be a single floating point value.
WeightedCorr(x=my_data['x'], y=my_data['y'], w=my_data['w'])()
  1. You can supply a pandas DataFrame, and the name of the weight column in that DataFrame. In this case the output will be an (M-1)x(M-1) pandas DataFrame (the correlation matrix) where M is the number of columns in the original dataframe (no correlation is calculated for the weight column, hence M-1).
WeightedCorr(df=my_data, wcol='w')(method='pearson')

Weighted Pearson correlation

The weighted Pearson r, given n pairs is calculated as

Where

Weighted Spearman rank-order correlation

First, initial ranks (z) are assigned to x and y. Duplicate groups of records are assigned the average rank of that group. Next the weighted rank (rank) is calculated for x and y separately in n pairs. Such that the j-th rank of either x or y will be:

Where

and

These weighted ranks are then passed to the weighted Pearson correlation function.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wcorr-2.2.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

wcorr-2.2-py2.py3-none-any.whl (5.1 kB view details)

Uploaded Python 2Python 3

File details

Details for the file wcorr-2.2.tar.gz.

File metadata

  • Download URL: wcorr-2.2.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.22.0

File hashes

Hashes for wcorr-2.2.tar.gz
Algorithm Hash digest
SHA256 851e857b515fdca11bf5bdbb423d397a00ee5a300ab068d3e8365b29a7bd4f26
MD5 a80158a720ce673e2def2d09bfc76cc6
BLAKE2b-256 929eae3cc57f2d26a147f4196167b7d69eeeac8f81664a5f8ca0c331d9794d02

See more details on using hashes here.

File details

Details for the file wcorr-2.2-py2.py3-none-any.whl.

File metadata

  • Download URL: wcorr-2.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 5.1 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.22.0

File hashes

Hashes for wcorr-2.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 31deb69c8a991469a4e80a910fb5306f92145cd397f7b5fc6037c7bf54d06103
MD5 7d784087de2dc645a21e1776371665be
BLAKE2b-256 7f4081e468a0aba68afb89de4d632158ad5fcd6048f534576fd47f8ec2361c0b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page