Skip to main content

Hyper-Fast Correlation Functions (Numba-Accelerated): Pearson's r, Spearman's rho, Kendall's tau, Chatterjee's xi, Somers’ D

Project description

hyper_corr -- Hyper-fast Correlation Functions

Hyper-fast, numba-accelerated correlation coefficients with SciPy-compatible results. hyper_corr provides drop-in replacements for common bivariate statistics—Pearson's r, Spearman's ρ, Kendall's τ, Chatterjee's ξ, and Somers' D—plus specialized variants that exploit pre-sorted inputs and known tie structure for maximum throughput. For sample sizes of N = 50 (with continuous tie-free data) speedups over Scipy range from approximately x150 to x1500 times faster.

Features

  • Numba-accelerated kernels for high-volume or repeated correlation evaluations.
  • SciPy-style return types (SignificanceResult/SomersDResult) from the general functions so existing code can adopt the faster implementations without large refactors.
  • Tie-aware and tie-free variants for Kendall, Spearman, Chatterjee, and Somers to match your data assumptions for extreme performance.

Installation

The library targets Python 3.8+ and depends on NumPy and Numba.

pip install numba numpy

#If you wish to use the included benchmarks for comparison to SciPy
pip install scipy

# optional for fast math optimizations on Intel CPUs
pip install icc_rt

#Install hyper-corr from pypi with pip
pip install hyper-corr

# or local install from source
pip install -e .

Quick Start

import numpy as np
from hyper_corr import pearsonr, spearmanr, kendalltau, chatterjeexi, somersd

rng = np.random.default_rng(seed=0)
x = rng.normal(size=500)
y = x * 0.75 + rng.normal(scale=0.25, size=500)

#Sorting by x not needed.
print(pearsonr(x, y))          # Pearson's r linear correlation
#Rank correlations with sorting and auto tie handling
print(spearmanr(x, y))         # Spearman's rho
print(kendalltau(x, y))        # Kendall's tau
print(chatterjeexi(x, y))      # Chatterjee's xi
print(somersd(x, y))           # Somers' D

Performance-focused Variants

If you already have sorted data, and know whether ties exist, call the specialized kernels directly for the fastest speeds:

from hyper_corr import spearmanr_noties, spearmanr_ties

# Example: tie-free Spearman's rho with pre-sorted x
idx = np.argsort(x, kind="stable")
x_sorted = x[idx]
y_ordered = y[idx]

rho, pvalue = spearmanr_noties(x_sorted, y_ordered, len(x_sorted))

# Example: Spearman's rho with pre-sorted x with ties
x_sorted = np.round(x_sorted, 1); y_ordered = np.round(y_ordered, 1)
rho, pvalue = spearmanr_ties(x_sorted, y_ordered, len(x_sorted))

Optimal Use Case

Many small/medium repeated slices of pre-sorted large arrays with known tie structure.

N = 1_000_000
rng = np.random.default_rng(0)
x = rng.normal(size=N); y = rng.normal(size=N)

W = 25              # window size
M = N - W + 1       # Number of windows

taus = np.empty(M, dtype=np.float64)
pvals = np.empty(M, dtype=np.float64)

ind = np.argsort(y, kind="stable")
x_sorted = x[ind]; y_ordered = y[ind]   # y in the same order as sorted x
    
ties = ((N-np.unique(x).size)>0) or ((N-np.unique(y).size)>0)

for i in range(M):
    xw = x_ordered[i:i+W]
    yw = y_sorted[i:i+W]
    if ties:
        tau, p = kendalltau_ties(xw, yw, W)
    else:
        tau, p = kendalltau_noties(xw, yw, W)
    taus[i] = tau
    pvals[i] = p

Notes

  • Data should be pre-cleaned. Sample data is not checked for real values or the fact that correlations must have n>2. nan is not taken into account. Speed was considered to be of the utmost importance.
  • For the *_tie/_notie functions x MUST be sorted and y MUST be ordered by that sort.
  • *_tie/_notie functions output stat and pvalue not SciPy return types as they are incompatible with Numba.
  • The first run of the included correlation functions are slower than future runs due to Numba compilation.

Development

Benchmarks and usage experiments live in the bench/ and examples/ folders. Packaging metadata is defined in pyproject.toml. Contributions should keep the public API exports in hyper_corr/__init__.py up to date.

Provenance and Licensing

Several kernels and statistical routines in hyper_corr originate from or were adapted from corresponding SciPy implementations. Those upstream sources are distributed under the BSD-3-Clause license, and their terms continue to apply to the derived portions of this project. The BSD-3-Clause obligations coexist with the MIT License that governs the rest of the codebase; using or redistributing hyper_corr should account for both license notices. Upstream attribution details live in THIRD_PARTY_LICENSES.md, and the bundled BSD-3-Clause text itself is stored in licenses/SciPy_LICENSE.txt.

License

Released under the MIT License alongside the third-party terms noted above. See LICENSE and THIRD_PARTY_LICENSES.md for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hyper_corr-0.2.1.tar.gz (17.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hyper_corr-0.2.1-py3-none-any.whl (17.3 MB view details)

Uploaded Python 3

File details

Details for the file hyper_corr-0.2.1.tar.gz.

File metadata

  • Download URL: hyper_corr-0.2.1.tar.gz
  • Upload date:
  • Size: 17.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for hyper_corr-0.2.1.tar.gz
Algorithm Hash digest
SHA256 42a6fa55cfe39fbc5667c583654692abe4a460dc194f4aa9d0a4cb4cd00af722
MD5 d094a7a932b8ba5a215e36b07b60a031
BLAKE2b-256 b96a01933ea2b9e385a627fbbe5f8ce7c43cfca0b63abbb689b24ead19787db5

See more details on using hashes here.

File details

Details for the file hyper_corr-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: hyper_corr-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 17.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for hyper_corr-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8f77aa4e35d655aacc7deed9a3ac71e84c9dd1226f2b6ba5de44be74c9e8afec
MD5 88747e3179078ca7c03d59d04068a021
BLAKE2b-256 c64d73bbc28be99b0cbcc0435d44772124bbb9eeba2b6ab8973c6c559019dca2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page