Skip to main content

Conditional independence testing and Markov blanket feature selection in Python

Project description

pycit

Framework for independence testing and conditional independence testing, with multiprocessing. Currently uses mutual information (MI) and conditional mutual information (CMI) as test statistics, estimated using k-NN methods. Also supports a routine for Markov blanket feature selection. Reports permutation-based p-values.

Installation

pip install pycit

Available Test Statistic Estimators

Mutual Information Estimators

  • ksg_mi: k-NN estimator for continuous data
  • bi_ksg_mi: "bias-improved" k-NN estimator for continuous data
  • mixed_mi: k-NN estimator for discrete-continuous mixtures

Conditional Mutual Information Estimators

  • ksg_cmi: k-NN estimator for continuous data
  • bi_ksg_cmi: "bias-improved" k-NN estimator for continuous data
  • mixed_cmi: k-NN estimator for discrete-continuous mixtures

Note: Also includes a differential entropy estimator: kl_entropy.

Example Usage

Independence Testing

from pycit import itest

# Test whether or not x and y are independent
pval = itest(x, y, test_args={'statistic': 'ksg_mi', 'n_jobs': 2})
is_independent = (pval >= 1.- confidence_level)

Conditional Independence Testing

from pycit import citest

# Test whether or not x and y are conditionally independent given z
pval = citest(x, y, z, test_args={'statistic': 'ksg_mi', 'n_jobs': 2})
is_conditionally_independent = (pval >= 1.- confidence_level)

Markov Blanket Feature Selection

from pycit.markov_blanket import MarkovBlanket

# specify CI test configuration
cit_funcs = {
    'it_args': {
        'test_args': {
            'statistic': 'ksg_mi',
            'n_jobs': 2
        }
    },
    'cit_args': {
        'test_args': {
            'statistic': 'ksg_cmi',
            'n_jobs': 2
        }
    }
}

# find Markov blanket of Y. x_data contains data from predictor variables, X_1,...,X_m
mb = MarkovBlanket(x_data, y_data, cit_funcs)
markov_blanket = mb.find_markov_blanket()

Dependencies:

  • numpy
  • scipy
  • scikit-learn

References:

  • Kozachenko, L. and Leonenko, N. (1987). Sample estimate of the entropy of a random vector. Problemy Peredachi Informatsii, 23(2):9–16.
  • Kraskov, A., Stögbauer, H., and Grassberger, P. (2004). Estimating mutual information. Physical Review E, 69(6):066138.
  • Frenzel, S. and Pompe, B. (2007). Partial mutual information for coupling analysis of multivariate time series. Physical Review Letters, 99(20):204101.
  • Gao, W., Kannan, S., Oh, S., and Viswanath, P. (2017). Estimating mutual information for discrete-continuous mixtures. In NIPS'2017.
  • Gao, W., Oh, S., and Viswanath, P. (2018). Demystifying fixed k-nearest neighbor information estimators. IEEE Transactions on Information Theory, 64(8):5629–5661.
  • Runge, J. (2018). Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information. In AISTATS'18.
  • Yang, A., Ghassami, A., Raginsky, M., Kiyavash, N., and Rosenbaum, E. (2020). Model-Augmented Estimation of Conditional Mutual Information for Feature Selection. In UAI'2020.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycit-0.0.4.tar.gz (11.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pycit-0.0.4-py3-none-any.whl (18.5 kB view details)

Uploaded Python 3

File details

Details for the file pycit-0.0.4.tar.gz.

File metadata

  • Download URL: pycit-0.0.4.tar.gz
  • Upload date:
  • Size: 11.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.7.4

File hashes

Hashes for pycit-0.0.4.tar.gz
Algorithm Hash digest
SHA256 49923973edf38d1444d4307c87a44476d444f75f613859c0ebaecf7f24667236
MD5 a94de2ec210bf583bdefd8d7399fb1d9
BLAKE2b-256 82db4007210696454b58256f1162d09ca25077cafd2b69f8ce4ada43e9c06700

See more details on using hashes here.

File details

Details for the file pycit-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: pycit-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 18.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.7.4

File hashes

Hashes for pycit-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 7be797f82e0883f532ff728768c90087e6cac35165379d77ee6eb1bd200b5418
MD5 9507f6d23b9e042bc71a3432d3596b1d
BLAKE2b-256 0ac875b0d8b134c18c39a86058c8c4fa588bca6fae89ce9c2d984b5946786b37

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page