Skip to main content

Statistical methods for computing many correlations

Project description

many

This package provides a general-use toolkit for frequently-implemented statistical and visual methods. See the blog post for an explanation of the purpose of this package and the methods used.

Full documentation

Installation

pip install many

Note: if you want to use CUDA-accelerated statistical methods (i.e. many.stats.mat_mwu_gpu), you must also independently install the corresponding version of cupy.

Components

Statistical methods

The statistical methods comprise several functions for association mining between variable pairs. These methods are optimized for pandas DataFrames and are inspired by the corrcoef function provided by numpy.

Because these functions rely on native matrix-level operations provided by numpy, many are orders of magnitude faster than naive looping-based alternatives. This makes them useful for constructing large association networks or for feature extraction, which have important uses in areas such as biomarker discovery. All methods also return estimates of statistical significance.

In certain cases such as the computation of correlation coefficients, these vectorized methods come with the caveat of numerical instability. As a compromise, "naive" loop-based implementations are also provided for testing and comparison. It is recommended that any significant results obtained with the vectorized methods be verified with these base methods.

The current functions available are listed below by variable comparison type. Benchmarks are also provided with comparisons to the equivalent looping-based method. In all methods, a melt option is provided to return the outputs as a set of row-column variable-variable pair statistic matrices or as a single DataFrame with each statistic melted to a column.

Visual methods

Several visual methods are also included for interpretation of results from the statistical methods. Like the statistical methods, these are also grouped by variable types plotted.

Development

  1. Install dependencies with poetry install
  2. Initialize environment with poetry shell
  3. Initialize pre-commit hooks with pre-commit install

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

many-0.7.2.tar.gz (22.4 kB view details)

Uploaded Source

Built Distribution

many-0.7.2-py3-none-any.whl (27.8 kB view details)

Uploaded Python 3

File details

Details for the file many-0.7.2.tar.gz.

File metadata

  • Download URL: many-0.7.2.tar.gz
  • Upload date:
  • Size: 22.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.10.6 Darwin/23.4.0

File hashes

Hashes for many-0.7.2.tar.gz
Algorithm Hash digest
SHA256 5002d88c49b6bfcab3476d65675ab5e7557a9ff7ea3d09b6a6aaaf12e6b7a196
MD5 b7f27a583000dd6516fe6581acd9a9de
BLAKE2b-256 7485a4bfeec8dbe9ebec3c56a4324c5e38e3383305f5f9e3fee78e4db6c55aed

See more details on using hashes here.

File details

Details for the file many-0.7.2-py3-none-any.whl.

File metadata

  • Download URL: many-0.7.2-py3-none-any.whl
  • Upload date:
  • Size: 27.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.10.6 Darwin/23.4.0

File hashes

Hashes for many-0.7.2-py3-none-any.whl
Algorithm Hash digest
SHA256 da14b65489672d1f81ee4aa5f0362efd9cd1a012a1390d753a32a1fef06553f7
MD5 6d86700559d7def05a7bca238afbc69d
BLAKE2b-256 b78fa4eb83bcfa75c289c6ae305676b2d9f93919123dfcc0130665bb2aca4986

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page