A decision-tree based conditional independence test
Project description
.. image:: https://img.shields.io/badge/License-MIT-yellow.svg
:target: https://opensource.org/licenses/MIT
:alt: License
*A Decision Tree (Conditional) Independence Test (DTIT).*
Introduction
-----------
Let *x, y, z* be random variables. Then deciding whether *P(y | x, z) = P(y | z)*
can be difficult, especially if the variables are continuous. This package
implements a simple yet efficient and effective conditional independence test,
described in [link to arXiv when we write it up!]. Important features that differentiate
this test from competition:
* It is fast. Worst-case speed scales as O(n_data * log(n_data) * dim), where dim is max(x_dim + z_dim, y_dim). However, amortized speed is O(n_data * log(n_data) * log(dim)).
* It applies to cases where some of x, y, z are continuous and some are discrete, or categorical (one-hot-encoded).
* It is very simple to understand and modify.
We have applied this test to tens of thousands of samples of thousand-dimensional datapoints in seconds. For smaller dimensionalities and sample sizes, it takes a fraction of a second. The algorithm is described in [arXiv link coming], where we also provide detailed experimental results and comparison with other methods. However for now, you should be able to just look through the code to understand what's going on -- it's only 90 lines of Python, including detailed comments!
Usage
-----
Basic usage is simple:
.. code:: python
import numpy as np
import dtit
# Generate some data such that x is indpendent of y given z.
n_samples = 300
z = np.random.dirichlet(alpha=np.ones(2), size=n_samples)
x = np.vstack([np.random.multinomial(20, p) for p in z])
y = np.vstack([np.random.multinomial(20, p) for p in z])
# Run the conditional independence test.
pval = dtit.test(x, y, z)
Here, we created discrete variables *x* and *y*, d-separated by a "common cause"
*z*. The null hypothesis is that *x* is independent of *y* given *z*. Since in this
case the variables are independent given *z*, pval should be distributed uniformly on [0, 1].
Requirements
------------
To use the nn methods:
* numpy >= 1.12
* scikit-learn >= 0.18.1
* scipy >= 0.16.1
.. _pip: http://www.pip-installer.org/en/latest/
:target: https://opensource.org/licenses/MIT
:alt: License
*A Decision Tree (Conditional) Independence Test (DTIT).*
Introduction
-----------
Let *x, y, z* be random variables. Then deciding whether *P(y | x, z) = P(y | z)*
can be difficult, especially if the variables are continuous. This package
implements a simple yet efficient and effective conditional independence test,
described in [link to arXiv when we write it up!]. Important features that differentiate
this test from competition:
* It is fast. Worst-case speed scales as O(n_data * log(n_data) * dim), where dim is max(x_dim + z_dim, y_dim). However, amortized speed is O(n_data * log(n_data) * log(dim)).
* It applies to cases where some of x, y, z are continuous and some are discrete, or categorical (one-hot-encoded).
* It is very simple to understand and modify.
We have applied this test to tens of thousands of samples of thousand-dimensional datapoints in seconds. For smaller dimensionalities and sample sizes, it takes a fraction of a second. The algorithm is described in [arXiv link coming], where we also provide detailed experimental results and comparison with other methods. However for now, you should be able to just look through the code to understand what's going on -- it's only 90 lines of Python, including detailed comments!
Usage
-----
Basic usage is simple:
.. code:: python
import numpy as np
import dtit
# Generate some data such that x is indpendent of y given z.
n_samples = 300
z = np.random.dirichlet(alpha=np.ones(2), size=n_samples)
x = np.vstack([np.random.multinomial(20, p) for p in z])
y = np.vstack([np.random.multinomial(20, p) for p in z])
# Run the conditional independence test.
pval = dtit.test(x, y, z)
Here, we created discrete variables *x* and *y*, d-separated by a "common cause"
*z*. The null hypothesis is that *x* is independent of *y* given *z*. Since in this
case the variables are independent given *z*, pval should be distributed uniformly on [0, 1].
Requirements
------------
To use the nn methods:
* numpy >= 1.12
* scikit-learn >= 0.18.1
* scipy >= 0.16.1
.. _pip: http://www.pip-installer.org/en/latest/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dtit-1.0.0.tar.gz
(4.3 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dtit-1.0.0.tar.gz.
File metadata
- Download URL: dtit-1.0.0.tar.gz
- Upload date:
- Size: 4.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f4e7853ce2fb42f9453801b9a41d12d74ce8896651f0ecf15c652381312ba63
|
|
| MD5 |
6d1025c14bab6aaa47d5d84787d8a9d0
|
|
| BLAKE2b-256 |
4faa4b3d17c8632eebcc2bbf3712b82a0ed7711e2f7335dcda7d8c9093e12dbb
|
File details
Details for the file dtit-1.0.0-py2.py3-none-any.whl.
File metadata
- Download URL: dtit-1.0.0-py2.py3-none-any.whl
- Upload date:
- Size: 4.4 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e103f695c5591b2669cff821c0fff72a4d2c714d54c0997089331051b0c6eebb
|
|
| MD5 |
1bb602da72ca8ed980316d3175c2732f
|
|
| BLAKE2b-256 |
d50fb877348993b4bf2d0aa2ae6a8ebdefdddf8660b31c8de87bf0f32d7c3b3a
|