Skip to main content

Hyper-Fast Correlation Functions (Numba-Accelerated): Pearson's r, Spearman's rho, Kendall's tau, Chatterjee's xi, Somers’ D

Project description

hyper_corr -- Hyper-fast Correlation Functions

Hyper-fast, numba-accelerated correlation coefficients with SciPy-compatible results. hyper_corr provides drop-in replacements for common bivariate statistics—Pearson's r, Spearman's ρ, Kendall's τ, Chatterjee's ξ, and Somers' D—plus specialized variants that exploit pre-sorted inputs and known tie structure for maximum throughput. For sample sizes of N = 50 (with continuous tie-free data) speedups over Scipy range from approximately x150 to x1500 times faster.

Features

  • Numba-accelerated kernels for high-volume or repeated correlation evaluations.
  • SciPy-style return types (SignificanceResult/SomersDResult) from the general functions so existing code can adopt the faster implementations without large refactors.
  • Tie-aware and tie-free variants for Kendall, Spearman, Chatterjee, and Somers to match your data assumptions for extreme performance.

Installation

The library targets Python 3.8+ and depends on NumPy and Numba.

pip install numba numpy

#If you wish to use the included benchmarks for comparison to SciPy
pip install scipy

# optional for fast math optimizations on Intel CPUs
pip install icc_rt

#Install hyper-corr from pypi with pip
pip install hyper-corr

# or local install from source
pip install -e .

Quick Start

import numpy as np
from hyper_corr import pearsonr, spearmanr, kendalltau, chatterjeexi, somersd

rng = np.random.default_rng(seed=0)
x = rng.normal(size=500)
y = x * 0.75 + rng.normal(scale=0.25, size=500)

#Sorting by x not needed.
print(pearsonr(x, y))          # Pearson's r linear correlation
#Rank correlations with sorting and auto tie handling
print(spearmanr(x, y))         # Spearman's rho
print(kendalltau(x, y))        # Kendall's tau
print(chatterjeexi(x, y))      # Chatterjee's xi
print(somersd(x, y))           # Somers' D

Performance-focused Variants

If you already have sorted data, and know whether ties exist, call the specialized kernels directly for the fastest speeds:

from hyper_corr import spearmanr_noties, spearmanr_ties

# Example: tie-free Spearman's rho with pre-sorted x
idx = np.argsort(x, kind="stable")
x_sorted = x[idx]
y_ordered = y[idx]

rho, pvalue = spearmanr_noties(x_sorted, y_ordered, len(x_sorted))

# Example: Spearman's rho with pre-sorted x with ties
x_sorted = np.round(x_sorted, 1); y_ordered = np.round(y_ordered, 1)
rho, pvalue = spearmanr_ties(x_sorted, y_ordered, len(x_sorted))

Optimal Use Case

Many small/medium repeated slices of pre-sorted large arrays with known tie structure.

N = 1_000_000
rng = np.random.default_rng(0)
x = rng.normal(size=N); y = rng.normal(size=N)

W = 25              # window size
M = N - W + 1       # Number of windows

taus = np.empty(M, dtype=np.float64)
pvals = np.empty(M, dtype=np.float64)

ind = np.argsort(y, kind="stable")
x_sorted = x[ind]; y_ordered = y[ind]   # y in the same order as sorted x
    
ties = ((N-np.unique(x).size)>0) or ((N-np.unique(y).size)>0)

for i in range(M):
    xw = x_ordered[i:i+W]
    yw = y_sorted[i:i+W]
    if ties:
        tau, p = kendalltau_ties(xw, yw, W)
    else:
        tau, p = kendalltau_noties(xw, yw, W)
    taus[i] = tau
    pvals[i] = p

Notes

  • Data should be pre-cleaned. Sample data is not checked for real values or the fact that correlations must have n>2. nan is not taken into account. Speed was considered to be of the utmost importance.
  • For the *_tie/_notie functions x MUST be sorted and y MUST be ordered by that sort.
  • *_tie/_notie functions output stat and pvalue not SciPy return types as they are incompatible with Numba.
  • The first run of the included correlation functions are slower than future runs due to Numba compilation.

Development

Benchmarks and usage experiments live in the bench/ and examples/ folders. Packaging metadata is defined in pyproject.toml. Contributions should keep the public API exports in hyper_corr/__init__.py up to date.

Provenance and Licensing

Several kernels and statistical routines in hyper_corr originate from or were adapted from corresponding SciPy implementations. Those upstream sources are distributed under the BSD-3-Clause license, and their terms continue to apply to the derived portions of this project. The BSD-3-Clause obligations coexist with the MIT License that governs the rest of the codebase; using or redistributing hyper_corr should account for both license notices. Upstream attribution details live in THIRD_PARTY_LICENSES.md, and the bundled BSD-3-Clause text itself is stored in licenses/SciPy_LICENSE.txt.

License

Released under the MIT License alongside the third-party terms noted above. See LICENSE and THIRD_PARTY_LICENSES.md for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hyper_corr-0.3.0.tar.gz (17.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hyper_corr-0.3.0-py3-none-any.whl (17.3 MB view details)

Uploaded Python 3

File details

Details for the file hyper_corr-0.3.0.tar.gz.

File metadata

  • Download URL: hyper_corr-0.3.0.tar.gz
  • Upload date:
  • Size: 17.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.11

File hashes

Hashes for hyper_corr-0.3.0.tar.gz
Algorithm Hash digest
SHA256 63a403eeb1368bed3eac30c371ed57f792fb86f2aced8962e2019a8abec1881f
MD5 29a074483e1dced6f87e9dce438469de
BLAKE2b-256 5732c8cb8282b27cecb40b8ae79f848f8c97c03add24965dd787c165336393dd

See more details on using hashes here.

File details

Details for the file hyper_corr-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: hyper_corr-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 17.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.11

File hashes

Hashes for hyper_corr-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2327d42236d71cd4c93bd07f6d063daa505260e39c84f7c1ff9606a87c3279b9
MD5 2df47eaa302754459942c575c8240b19
BLAKE2b-256 985a4a923af6a5edec9033490983a672870bc389b91fa9db0a66ccf628bc6cbb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page