Skip to main content

An exact test for coincidence of feature values along a sample set.

Project description

This exact test assesses the statistical significance of finding a feature subset in binary feature data such that the number of simultaneously-positive samples is large.

Everything needed to perform the test is located in the standalone module _coincidencetest.py.

Example

Install from PyPI:

pip install coincidencetest

Usage is shown below:

import coincidencetest
from coincidencetest import coincidencetest
coincidencetest(2, [3, 3, 3, 3], 10)

0.0008877

This example shows that the probability is about 0.09% that four features, each occurring with frequency 3/10, will simultaneously occur in 2 or more samples.

The example coincidencetest(1, [5, 3, 7], 100) yields p=0.01047, showing that the probability of even just one sample having all features can be very low, provided that enough of the features are individually relatively rare.

CLI application

To make the test immediately useful, this package is distributed together with a lightweight "Formal Concept Analysis" feature set discovery tool.

The installed package exposes the command-line program coincidence-clustering incoporating this tool. Use it like so:

coincidence-clustering \
  --input-filename=example_data/bc_cell_data.tsv \
  --output-tsv=signatures.tsv \
  --level-limit=50 \
  --max-recursion=3

Web application

A Javascript port of the signature discovery and testing program is located in webapp/. To run it locally, use:

cd webapp/
chmod +x build.py
./build.py
python -m http.server 8080

Then open your browser to localhost:8080 or 0.0.0.0:8080.

Note: The Javascript application only requires the server to have the capability of serving static files, namely the files index.html and worker.js created by the build process. However, most browsers block the use of the "web workers" from the local file system, so this minimal Python server is needed for local deployment. We use web workers in order to allow dynamic display of feature sets in real-time as they are identified.

Code testing

The package is tested with

pytest .

The key step is a computation of the number of covers of a set of a given size by sets of prescribed sizes (equivalently, the number of subsets of prescribed sizes without common intersection), so the most important tests check that several different algorithms for cover counting agree in small-number cases.

Issues

Please report all issues as GitHub issues.

License

© Nadeem Lab - The core module is distributed under the 3-clause BSD license. All other modules are distributed under Apache 2.0 with Commons Clause license, and are available for non-commercial academic purposes.

Reference

If you use this code or parts of it, please cite our paper:

@article{mathews2021coincidencetest,
  title={An exact test for significance of clusters in binary data},
  author={Mathews, James C, and Crowe, Cameron and Vanguri, Rami and Callahan, Margaret and Hollmann, Travis J and and Nadeem, Saad},
  journal={arXiv},
  year={2021}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

coincidencetest-1.0.19.tar.gz (13.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

coincidencetest-1.0.19-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file coincidencetest-1.0.19.tar.gz.

File metadata

  • Download URL: coincidencetest-1.0.19.tar.gz
  • Upload date:
  • Size: 13.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5

File hashes

Hashes for coincidencetest-1.0.19.tar.gz
Algorithm Hash digest
SHA256 a39385416d96fbf6c5fa168dffa9a6c187b9c34f12639bbe8e19aeec755d95c4
MD5 08e6bad023038d43f1a77c6ce200dd31
BLAKE2b-256 9749bcd639693d6e1fd81af682db03884ef4ecbc9a024f49b881c7075b69d7e8

See more details on using hashes here.

File details

Details for the file coincidencetest-1.0.19-py3-none-any.whl.

File metadata

  • Download URL: coincidencetest-1.0.19-py3-none-any.whl
  • Upload date:
  • Size: 14.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5

File hashes

Hashes for coincidencetest-1.0.19-py3-none-any.whl
Algorithm Hash digest
SHA256 5f636b26e0d1e76bab116c65ac23eae16659beeca257e3f01b3b0cedd94b7529
MD5 eaa4c219b7b94cb81837d3feb2861019
BLAKE2b-256 7f1372b538abbb4ea3a549bc7adec1ba6fb38ee7ecf83e15901910200b2c57b6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page