Hyper-Fast Correlation Functions (Numba-Accelerated): Pearson's r, Spearman's rho, Kendall's tau, Chatterjee's xi, Somers’ D
Project description
hyper_corr -- Hyper-fast Correlation Functions
Hyper-fast, numba-accelerated correlation coefficients with SciPy-compatible results. hyper_corr provides drop-in replacements for common bivariate statistics—Pearson's r, Spearman's ρ, Kendall's τ, Chatterjee's ξ, and Somers' D—plus specialized variants that exploit pre-sorted inputs and known tie structure for maximum throughput.
For sample sizes of N = 50 (with continuous tie-free data) speedups over Scipy range from approximately x150 to x1500 times faster.
Features
- Numba-accelerated kernels for high-volume or repeated correlation evaluations.
- SciPy-style return types (
SignificanceResult/SomersDResult) from the general functions so existing code can adopt the faster implementations without large refactors. - Tie-aware and tie-free variants for Kendall, Spearman, Chatterjee, and Somers to match your data assumptions for extreme performance.
Installation
The library targets Python 3.8+ and depends on NumPy and Numba.
pip install numba numpy
#If you wish to use the included benchmarks for comparison to SciPy
pip install scipy
# optional for fast math optimizations on Intel CPUs
pip install icc_rt
#Install hyper-corr from pypi with pip
pip install hyper-corr
# or local install from source
pip install -e .
Quick Start
import numpy as np
from hyper_corr import pearsonr, spearmanr, kendalltau, chatterjeexi, somersd
rng = np.random.default_rng(seed=0)
x = rng.normal(size=500)
y = x * 0.75 + rng.normal(scale=0.25, size=500)
#Sorting by x not needed.
print(pearsonr(x, y)) # Pearson's r linear correlation
#Rank correlations with sorting and auto tie handling
print(spearmanr(x, y)) # Spearman's rho
print(kendalltau(x, y)) # Kendall's tau
print(chatterjeexi(x, y)) # Chatterjee's xi
print(somersd(x, y)) # Somers' D
Performance-focused Variants
If you already have sorted data, and know whether ties exist, call the specialized kernels directly for the fastest speeds:
from hyper_corr import spearmanr_noties, spearmanr_ties
# Example: tie-free Spearman's rho with pre-sorted x
idx = np.argsort(x, kind="stable")
x_sorted = x[idx]
y_ordered = y[idx]
rho, pvalue = spearmanr_noties(x_sorted, y_ordered, len(x_sorted))
# Example: Spearman's rho with pre-sorted x with ties
x_sorted = np.round(x_sorted, 1); y_ordered = np.round(y_ordered, 1)
rho, pvalue = spearmanr_ties(x_sorted, y_ordered, len(x_sorted))
Optimal Use Case
Many small/medium repeated slices of pre-sorted large arrays with known tie structure.
N = 1_000_000
rng = np.random.default_rng(0)
x = rng.normal(size=N); y = rng.normal(size=N)
W = 25 # window size
M = N - W + 1 # Number of windows
taus = np.empty(M, dtype=np.float64)
pvals = np.empty(M, dtype=np.float64)
ind = np.argsort(y, kind="stable")
x_sorted = x[ind]; y_ordered = y[ind] # y in the same order as sorted x
ties = ((N-np.unique(x).size)>0) or ((N-np.unique(y).size)>0)
for i in range(M):
xw = x_ordered[i:i+W]
yw = y_sorted[i:i+W]
if ties:
tau, p = kendalltau_ties(xw, yw, W)
else:
tau, p = kendalltau_noties(xw, yw, W)
taus[i] = tau
pvals[i] = p
Notes
- Data should be pre-cleaned. Sample data is not checked for real values or the fact that correlations must have n>2. nan is not taken into account. Speed was considered to be of the utmost importance.
- For the *_tie/_notie functions x MUST be sorted and y MUST be ordered by that sort.
- *_tie/_notie functions output stat and pvalue not SciPy return types as they are incompatible with Numba.
- The first run of the included correlation functions are slower than future runs due to Numba compilation.
Development
Benchmarks and usage experiments live in the bench/ and examples/ folders. Packaging metadata is defined in pyproject.toml. Contributions should keep the public API exports in hyper_corr/__init__.py up to date.
Provenance and Licensing
Several kernels and statistical routines in hyper_corr originate from or were adapted from corresponding SciPy implementations. Those upstream sources are distributed under the BSD-3-Clause license, and their terms continue to apply to the derived portions of this project. The BSD-3-Clause obligations coexist with the MIT License that governs the rest of the codebase; using or redistributing hyper_corr should account for both license notices. Upstream attribution details live in THIRD_PARTY_LICENSES.md, and the bundled BSD-3-Clause text itself is stored in licenses/SciPy_LICENSE.txt.
License
Released under the MIT License alongside the third-party terms noted above. See LICENSE and THIRD_PARTY_LICENSES.md for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hyper_corr-0.2.1.tar.gz.
File metadata
- Download URL: hyper_corr-0.2.1.tar.gz
- Upload date:
- Size: 17.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
42a6fa55cfe39fbc5667c583654692abe4a460dc194f4aa9d0a4cb4cd00af722
|
|
| MD5 |
d094a7a932b8ba5a215e36b07b60a031
|
|
| BLAKE2b-256 |
b96a01933ea2b9e385a627fbbe5f8ce7c43cfca0b63abbb689b24ead19787db5
|
File details
Details for the file hyper_corr-0.2.1-py3-none-any.whl.
File metadata
- Download URL: hyper_corr-0.2.1-py3-none-any.whl
- Upload date:
- Size: 17.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f77aa4e35d655aacc7deed9a3ac71e84c9dd1226f2b6ba5de44be74c9e8afec
|
|
| MD5 |
88747e3179078ca7c03d59d04068a021
|
|
| BLAKE2b-256 |
c64d73bbc28be99b0cbcc0435d44772124bbb9eeba2b6ab8973c6c559019dca2
|