Skip to main content

Compute and cache Fisher's exact test and Boschloo's test more efficiently!

Project description

CachedContingency

Python 3.9+ classes to compute and cache Fisher's exact test and Boschloo's test more efficiently.

Installation

This package requires at least Python 3.9.

pip install cached_contingency

Idea

I have to compute lots of these tests and want to accelerate the process. There are two optimizations that came to my mind:

  1. My contingency tables often have identical column sums, so many tests can be recycled
  2. Some contingency tables are equivalent and only have to be computed once
    • Fisher's test: abcd, acbd, badc, bdac, cadb, cdab, dbca and dcba are equivalent (pvalue, not odds ratio)
    • Boschloo's test: abcd, badc, cdab and dcba are equivalent

Furthermore, sometimes, one has to re-run tools. In these cases, all previously computed results can be recycled.

As cache, an SQLite database is used.

Execution

  1. Replace equivalent contingency tables with the same contingency table
  2. Find all tests that are not cached yet
  3. Calculate them in parallel, using all CPU cores
  4. Add them to the cache
  5. Return results

Usage

Set the location of the cache database:

export KEY_VALUE_STORE_DB=/custom/path.db  # default: ~/.cache/keyvaluestore.db

Calculate single tests:

from cached_contingency import CachedFisher, CachedBoschloo, odds_ratio
from scipy.stats import fisher_exact, boschloo_exact
from numpy import isclose

# Create class (automatically creates database if none exists yet)
cf = CachedFisher()
# Calculate Fisher's test
pval_cache = cf.get_or_create(74, 31, 43, 32)
odds_ratio_cache = odds_ratio(74, 31, 43, 32)
# This is equivalent to:
odds_ratio_calc, pval_calc = fisher_exact([[74, 31], [43, 32]])
assert isclose(pval_cache, pval_calc)
assert isclose(odds_ratio_cache, odds_ratio_calc)

# Create class (automatically creates database if none exists yet)
cb = CachedBoschloo()
# Calculate Fisher's test
pval_cache = cb.get_or_create(74, 31, 43, 32)
# This is almost* equivalent to:
pval_calc = boschloo_exact([[74, 31], [43, 32]]).pvalue
assert isclose(pval_cache, pval_calc)
  • *: Not exactly equivalent: My function never returns pvalues greater than 1 and never returns nan as pvalues. (See scipy issue.)

Calculate multiple tests:

from cached_contingency import CachedFisher, CachedBoschloo
import pandas as pd
import numpy as np

# Create class (automatically creates database if none exists yet)
cb = CachedBoschloo()

# Create test DataFrame, column names are important!
np.random.seed(42)
test_df = pd.DataFrame(
    [(np.random.randint(200) for _ in range(4)) for _ in range(5)],
    columns=['c1r1', 'c2r1', 'c1r2', 'c2r2']
)
print(test_df)
#    c1r1  c2r1  c1r2  c2r2
# 0   102   179    92    14
# 1   106    71   188    20
# 2   102   121    74    87
# 3   116    99   103   151
# 4   130   149    52     1

# Calculate multiple Boschloo's tests
result_df = cb.get_or_create_many(test_df)
print(result_df)
#    c1r1  c2r1  c1r2  c2r2          pval
# 0   102   179    92    14  3.442564e-20
# 1   106    71   188    20  1.144156e-12
# 2   102   121    74    87  9.692791e-01
# 3   116    99   103   151  3.821222e-03
# 4   130   149    52     1  1.831830e-14

# If you run this again, the results will be loaded from cache:
result_df = cb.get_or_create_many(test_df)
print('Like a flash!')

Advanced usage:

Alternative way to specify the path to the database via Python and change number of CPUs:

from cached_contingency import CachedFisher

cf = CachedFisher(db_path='/custom/path.db', n_cpus=1)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cached-contingency-0.0.4.tar.gz (7.0 kB view details)

Uploaded Source

File details

Details for the file cached-contingency-0.0.4.tar.gz.

File metadata

  • Download URL: cached-contingency-0.0.4.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.27.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.1

File hashes

Hashes for cached-contingency-0.0.4.tar.gz
Algorithm Hash digest
SHA256 ff2b76598feff335805853456309b4e4bf8937ed773a4f2c1b90c9dca8a9506d
MD5 f1e1e75217cca51222f311e51f75770a
BLAKE2b-256 c3e2c44249c76b449013c71700f2a192cc2efa5e59aa654b5d088851c41f239d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page