Statistical methods for computing many correlations
Project description
many
This package provides a general-use toolkit for frequently-implemented statistical and visual methods. See the blog post for an explanation of the purpose of this package and the methods used.
Installation
pip install many
Note: if you want to use CUDA-accelerated statistical methods (i.e. many.stats.mat_mwu_gpu), you must also independently install the corresponding version of cupy.
Components
Statistical methods
The statistical methods comprise several functions for association mining between variable pairs. These methods are optimized for pandas DataFrames and are inspired by the corrcoef function provided by numpy.
Because these functions rely on native matrix-level operations provided by numpy, many are orders of magnitude faster than naive looping-based alternatives. This makes them useful for constructing large association networks or for feature extraction, which have important uses in areas such as biomarker discovery. All methods also return estimates of statistical significance.
In certain cases such as the computation of correlation coefficients, these vectorized methods come with the caveat of numerical instability. As a compromise, "naive" loop-based implementations are also provided for testing and comparison. It is recommended that any significant results obtained with the vectorized methods be verified with these base methods.
The current functions available are listed below by variable comparison type. Benchmarks are also provided with comparisons to the equivalent looping-based method. In all methods, a melt option is provided to return the outputs as a set of row-column variable-variable pair statistic matrices or as a single DataFrame with each statistic melted to a column.
Visual methods
Several visual methods are also included for interpretation of results from the statistical methods. Like the statistical methods, these are also grouped by variable types plotted.
Development
- Install dependencies with
poetry install - Initialize environment with
poetry shell - Initialize pre-commit hooks with
pre-commit install
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file many-0.7.2.tar.gz.
File metadata
- Download URL: many-0.7.2.tar.gz
- Upload date:
- Size: 22.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.10.6 Darwin/23.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5002d88c49b6bfcab3476d65675ab5e7557a9ff7ea3d09b6a6aaaf12e6b7a196
|
|
| MD5 |
b7f27a583000dd6516fe6581acd9a9de
|
|
| BLAKE2b-256 |
7485a4bfeec8dbe9ebec3c56a4324c5e38e3383305f5f9e3fee78e4db6c55aed
|
File details
Details for the file many-0.7.2-py3-none-any.whl.
File metadata
- Download URL: many-0.7.2-py3-none-any.whl
- Upload date:
- Size: 27.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.10.6 Darwin/23.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da14b65489672d1f81ee4aa5f0362efd9cd1a012a1390d753a32a1fef06553f7
|
|
| MD5 |
6d86700559d7def05a7bca238afbc69d
|
|
| BLAKE2b-256 |
b78fa4eb83bcfa75c289c6ae305676b2d9f93919123dfcc0130665bb2aca4986
|