Skip to main content

Conditional independence testing and Markov blanket feature selection in Python

Project description

pycit

Framework for independence testing and conditional independence testing, with multiprocessing. Currently uses mutual information (MI) and conditional mutual information (CMI) as test statistics, estimated using k-NN methods. Also supports a routine for Markov blanket feature selection. Reports permutation-based p-values.

Installation

pip install pycit

Available Test Statistic Estimators

Mutual Information Estimators

  • ksg_mi: k-NN estimator for continuous data
  • bi_ksg_mi: "bias-improved" k-NN estimator for continuous data
  • mixed_mi: k-NN estimator for discrete-continuous mixtures

Conditional Mutual Information Estimators

  • ksg_cmi: k-NN estimator for continuous data
  • bi_ksg_cmi: "bias-improved" k-NN estimator for continuous data
  • mixed_cmi: k-NN estimator for discrete-continuous mixtures

Note: Also includes a differential entropy estimator: kl_entropy.

Example Usage

Independence Testing

from pycit import itest

# Test whether or not x and y are independent
pval = itest(x, y, test_args={'statistic': 'ksg_mi', 'n_jobs': 2})
is_independent = (pval >= 1.- confidence_level)

Conditional Independence Testing

from pycit import citest

# Test whether or not x and y are conditionally independent given z
pval = citest(x, y, z, test_args={'statistic': 'ksg_mi', 'n_jobs': 2})
is_conditionally_independent = (pval >= 1.- confidence_level)

Markov Blanket Feature Selection

from pycit.markov_blanket import MarkovBlanket

# specify CI test configuration
cit_funcs = {
    'it_args': {
        'test_args': {
            'statistic': 'ksg_mi',
            'n_jobs': 2
        }
    },
    'cit_args': {
        'test_args': {
            'statistic': 'ksg_cmi',
            'n_jobs': 2
        }
    }
}

# find Markov blanket of Y. x_data contains data from predictor variables, X_1,...,X_m
mb = MarkovBlanket(x_data, y_data, cit_funcs)
markov_blanket = mb.find_markov_blanket()

Dependencies:

  • numpy
  • scipy
  • scikit-learn

References:

  • Kozachenko, L. and Leonenko, N. (1987). Sample estimate of the entropy of a random vector. Problemy Peredachi Informatsii, 23(2):9–16.
  • Kraskov, A., Stögbauer, H., and Grassberger, P. (2004). Estimating mutual information. Physical Review E, 69(6):066138.
  • Frenzel, S. and Pompe, B. (2007). Partial mutual information for coupling analysis of multivariate time series. Physical Review Letters, 99(20):204101.
  • Gao, W., Kannan, S., Oh, S., and Viswanath, P. (2017). Estimating mutual information for discrete-continuous mixtures. In NIPS'2017.
  • Gao, W., Oh, S., and Viswanath, P. (2018). Demystifying fixed k-nearest neighbor information estimators. IEEE Transactions on Information Theory, 64(8):5629–5661.
  • Runge, J. (2018). Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information. In AISTATS'18.
  • Yang, A., Ghassami, A., Raginsky, M., Kiyavash, N., and Rosenbaum, E. (2020). Model-Augmented Estimation of Conditional Mutual Information for Feature Selection. In UAI'2020.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycit-0.0.7.tar.gz (11.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pycit-0.0.7-py3-none-any.whl (18.6 kB view details)

Uploaded Python 3

File details

Details for the file pycit-0.0.7.tar.gz.

File metadata

  • Download URL: pycit-0.0.7.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.7.4

File hashes

Hashes for pycit-0.0.7.tar.gz
Algorithm Hash digest
SHA256 c70f24784262e2c4d586719c0f3f0c6736de06fd7f12d804c995a560253f6155
MD5 44588ca4f47fd3b4588b6d5936f23c9f
BLAKE2b-256 353665d50566dc7caa78bc054d44eb9fb56f0bd8f189bc2c88d42fd1fc75cf7d

See more details on using hashes here.

File details

Details for the file pycit-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: pycit-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 18.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.7.4

File hashes

Hashes for pycit-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 ec6a55158312cfd10ad865a07e4c5a34833f35f56a9420c85a7c79a751332cdd
MD5 f7f6e30a741789dbf340f516b3e6d12c
BLAKE2b-256 019072f3126d8d9b472b1a04f9ad26dd0968b6ef4486f03e50c558f7b6dad008

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page