Skip to main content

A decision-tree based conditional independence test

Project description

.. image:: https://img.shields.io/badge/License-MIT-yellow.svg
:target: https://opensource.org/licenses/MIT
:alt: License

*A Decision Tree (Conditional) Independence Test (DTIT).*

Introduction
-----------
Let *x, y, z* be random variables. Then deciding whether *P(y | x, z) = P(y | z)*
can be difficult, especially if the variables are continuous. This package
implements a simple yet efficient and effective conditional independence test,
described in [link to arXiv when we write it up!]. Important features that differentiate
this test from competition:

* It is fast. Worst-case speed scales as O(n_data * log(n_data) * dim), where dim is max(x_dim + z_dim, y_dim). However, amortized speed is O(n_data * log(n_data) * log(dim)).

* It applies to cases where some of x, y, z are continuous and some are discrete, or categorical (one-hot-encoded).

* It is very simple to understand and modify.

We have applied this test to tens of thousands of samples of thousand-dimensional datapoints in seconds. For smaller dimensionalities and sample sizes, it takes a fraction of a second. The algorithm is described in [arXiv link coming], where we also provide detailed experimental results and comparison with other methods. However for now, you should be able to just look through the code to understand what's going on -- it's only 90 lines of Python, including detailed comments!

Usage
-----
Basic usage is simple:

.. code:: python

import numpy as np
import dtit
# Generate some data such that x is indpendent of y given z.
n_samples = 300
z = np.random.dirichlet(alpha=np.ones(2), size=n_samples)
x = np.vstack([np.random.multinomial(20, p) for p in z])
y = np.vstack([np.random.multinomial(20, p) for p in z])

# Run the conditional independence test.
pval = dtit.test(x, y, z)

Here, we created discrete variables *x* and *y*, d-separated by a "common cause"
*z*. The null hypothesis is that *x* is independent of *y* given *z*. Since in this
case the variables are independent given *z*, pval should be distributed uniformly on [0, 1].

Requirements
------------
To use the nn methods:
* numpy >= 1.12
* scikit-learn >= 0.18.1
* scipy >= 0.16.1

.. _pip: http://www.pip-installer.org/en/latest/


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dtit-1.0.0.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dtit-1.0.0-py2.py3-none-any.whl (4.4 kB view details)

Uploaded Python 2Python 3

File details

Details for the file dtit-1.0.0.tar.gz.

File metadata

  • Download URL: dtit-1.0.0.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dtit-1.0.0.tar.gz
Algorithm Hash digest
SHA256 9f4e7853ce2fb42f9453801b9a41d12d74ce8896651f0ecf15c652381312ba63
MD5 6d1025c14bab6aaa47d5d84787d8a9d0
BLAKE2b-256 4faa4b3d17c8632eebcc2bbf3712b82a0ed7711e2f7335dcda7d8c9093e12dbb

See more details on using hashes here.

File details

Details for the file dtit-1.0.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for dtit-1.0.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 e103f695c5591b2669cff821c0fff72a4d2c714d54c0997089331051b0c6eebb
MD5 1bb602da72ca8ed980316d3175c2732f
BLAKE2b-256 d50fb877348993b4bf2d0aa2ae6a8ebdefdddf8660b31c8de87bf0f32d7c3b3a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page