Skip to main content

B-spline based mutual information estimator

Project description

bspline_mutual_information

Utility to bin continuous variables and estimate mutual information based on B-Spline binning.

This is an adaption of Carsten Daub's R implementation[^1] of the algorithm described in Daub et.al 2004[^2].

Dependencies:

  • python >= 3.9
  • numpy >= 1.22.0
  • scipy >= 1.9.0

Installation

Recommended

The easiest way to install the package is directly via pip:

pip install bspline-mutual-information

Alternative

The most up-to-date version of the package can be also be installed by specifying the git repository directly in the pip install command

pip install git+https://github.com/pnnl-predictive-phenomics/bspline_mutual_information.git

or alternatively, cloning the repository to a local folder and installed via pip from source:

git clone https://github.com/pnnl-predictive-phenomics/bspline_mutual_information.git
cd bspline_mutual_information
pip install .

Usage

Once installed the module consits of two functions:

  • bspline_mutual_information.bspline_bin()
  • bspline_mutual_information.mutual_information()

bspline_bin() is both called internally in mutual_information() but can also be used to manually bin continuous data into discrete bins if so desired.

For this example we will focus on mutual_information() since it is the primary use case of this package.

An example can be found below on how to use mutual_information() to estimate the mutual information between two vectors containing continuous data points. It assumes a python environment.

>>> from bspline_mutual_information import mutual_information
>>> x = [1,2,3,4,5]
>>> y = [1,2,1,2,3]
>>> mutual_information(x, y, bins=5, spline_order=3)
0.4740122135541802

If mutual information for pairs of columns in a whole matrix or pandas DataFrame should be calculated this can be done by leveraging pandas DataFrame.corr() function. However, since corr() does not allow for keyword arguments a helper function needs to be defined first that specifies the arguments mutual_information() should be executed with. Consider the example below where the helper function mut_inf(x, y) is defined as returning the mutual information calculated using the parameters bins=5, spline_order=1 and correct=True (which corrects for the finit size effect if spline_order==1). Note that DataFrame.corr() will always fill the diagonal with 1 values independent of the chosen method.

>>> import pandas as pd
>>> import bspline_mutual_information as bsp
>>> data = pd.DataFrame({
...     'a': [0, 1, 2, 3, 4],
...     'b': [5, 6, 7, 8, 9],
...     'c': [0, 1, 2, 1, 0],
...     'd': [3, 2, 1, 3, 2]
...     })
...
>>> def mut_inf(x, y):
...     mi = bsp.mutual_information(
...         x, y,
...         bins=5,
...         spline_order=1,
...         correct=True
...         )
...     return mi
...
>>> data.corr(method=mut_inf)
          a         b         c         d
a  1.000000  1.921928  1.121928  1.121928
b  1.921928  1.000000  1.121928  1.121928
c  1.121928  1.121928  1.000000  0.321928
d  1.121928  1.121928  0.321928  1.000000

References

[^1]: C. Daub's R implementation

[^2]: Daub CO, Steuer R, Selbig J, Kloska S. Estimating mutual information using B-spline functions--an improved similarity measure for analysing gene expression data. BMC Bioinformatics. 2004 Aug 31;5:118. doi: 10.1186/1471-2105-5-118. PMID: 15339346; PMCID: PMC516800.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bspline_mutual_information-1.0.0.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

bspline_mutual_information-1.0.0-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file bspline_mutual_information-1.0.0.tar.gz.

File metadata

File hashes

Hashes for bspline_mutual_information-1.0.0.tar.gz
Algorithm Hash digest
SHA256 e9a1420d06bf04f0b451fbbd98e422e8173a8e39c500122cc281856da9684222
MD5 b31e63b8ce4fa52819a8e2dadb8ae6b8
BLAKE2b-256 48602e8a882fc947062fbea17e6bde6f0b42b71080353fe3f04cdddf6ae7c127

See more details on using hashes here.

File details

Details for the file bspline_mutual_information-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for bspline_mutual_information-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 53f59468f9fc3636ef27bd2d89033a5a60786a71d00867c3d48c7c2d718b35f6
MD5 391344f9e3aa08feb3d13fa90564f1f1
BLAKE2b-256 0cc32ab4fe678454d4dcc5f15949759a75083c99cad774322da1a93b250f5e0e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page