Skip to main content

Methods for symmetric and asymmetric analysis of categorical co-occurrences

Project description

catcoocc

Build Status codecov Codacy Badge

Library for symmetrical and assymetrical analysis of categorical co-occurrences

Installation and usage

The library can be installed as any standard Python library with pip:

pip install catcoocc

Detailed instructions on how to use the library can be found in the official documentation.

A show-case example is shown here:

import tabulate
import catcoocc
from catcoocc.scorer import CatScorer

mushroom_data = catcoocc.read_sequences("docs/mushroom-small.tsv")
mushroom_cooccs = catcoocc.collect_cooccs(mushroom_data)
scorer = catcoocc.scorer.CatScorer(mushroom_cooccs)

mle = scorer.mle()
pmi = scorer.pmi()
npmi = scorer.pmi(True)
chi2 = scorer.chi2()
chi2_ns = scorer.chi2(False)
cramersv = scorer.cramers_v()
cramersv_ns = scorer.cramers_v(False)
fisher = scorer.fisher()
theil_u = scorer.theil_u()
catcoocc_i = scorer.catcoocc_i()
catcoocc_ii = scorer.catcoocc_ii()

headers = [
    'pair',
    'mle_0',        'mle_1', 
    'pmi_0',        'pmi_1', 
    'npmi_0',       'npmi_1', 
    'chi2_0',       'chi2_1', 
    'chi2ns_0',     'chi2ns_1', 
    'cremersv_0',   'cremersv_1', 
    'cremersvns_0', 'cremersvns_1', 
    'fisher_0',     'fisher_1', 
    'theilu_0',     'theilu_1', 
    'catcoocci_0',  'catcoocci_1', 
    'catcooccii_0', 'catcooccii_1', 
]

table = []
for pair in sorted(scorer.obs):
    buf = [
        pair,
        "%0.4f" % mle[pair][0],         "%0.4f" % mle[pair][1],
        "%0.4f" % pmi[pair][0],         "%0.4f" % pmi[pair][1],
        "%0.4f" % npmi[pair][0],        "%0.4f" % npmi[pair][1],
        "%0.4f" % chi2[pair][0],        "%0.4f" % chi2[pair][1],
        "%0.4f" % chi2_ns[pair][0],     "%0.4f" % chi2_ns[pair][1],
        "%0.4f" % cramersv[pair][0],    "%0.4f" % cramersv[pair][1],
        "%0.4f" % cramersv_ns[pair][0], "%0.4f" % cramersv_ns[pair][1],
        "%0.4f" % fisher[pair][0],      "%0.4f" % fisher[pair][1],
        "%0.4f" % theil_u[pair][0],     "%0.4f" % theil_u[pair][1],
        "%0.4f" % catcoocc_i[pair][0],  "%0.4f" % catcoocc_i[pair][1],
        "%0.4f" % catcoocc_ii[pair][0], "%0.4f" % catcoocc_ii[pair][1],
    ]
    table.append(buf)


print(tabulate.tabulate(table, headers=headers, tablefmt='markdown'))

Which will output:

pair                       mle_0    mle_1    pmi_0    pmi_1    npmi_0    npmi_1    chi2_0    chi2_1    chi2ns_0    chi2ns_1    cremersv_0    cremersv_1    cremersvns_0    cremersvns_1    fisher_0    fisher_1    theilu_0    theilu_1    catcoocci_0    catcoocci_1    catcooccii_0    catcooccii_1
-----------------------  -------  -------  -------  -------  --------  --------  --------  --------  ----------  ----------  ------------  ------------  --------------  --------------  ----------  ----------  ----------  ----------  -------------  -------------  --------------  --------------
('edible', 'bell')        0.3846   1        0.4308   0.4308    0.3107    0.3107    1.8315    1.8315      3.5897      3.5897        0.2027        0.2027          0.1987          0.1987         inf         inf      1           0.3985         0.4308         0.1717          0.789           0.789
('edible', 'convex')      0.4615   0.4615  -0.3424  -0.3424   -0.2844   -0.2844    3.6735    3.6735      5.7988      5.7988        0.3719        0.3719          0.3101          0.3101           0           0      0.2955      0.1823        -0.1012        -0.0624         -1.2578         -1.2578
('edible', 'flat')        0.0769   1        0.4308   0.4308    0.1438    0.1438    0.1041    0.1041      0.5668      0.5668        0             0               0               0              inf         inf      1           1              0.4308         0.4308          0.0448          0.0448
('edible', 'sunken')      0.0769   1        0.4308   0.4308    0.1438    0.1438    0.1041    0.1041      0.5668      0.5668        0             0               0               0              inf         inf      1           1              0.4308         0.4308          0.0448          0.0448
('poisonous', 'bell')     0        0       -3.5553  -3.5553   -0.5934   -0.5934    1.8315    1.8315      3.5897      3.5897        0.2027        0.2027          0.1987          0.1987           0           0      1           1             -3.5553        -3.5553         -6.5116         -6.5116
('poisonous', 'convex')   1        0.5385   0.4308   0.4308    0.4103    0.4103    3.6735    3.6735      5.7988      5.7988        0.3719        0.3719          0.3101          0.3101         inf         inf      0.0105      1              0.0045         0.4308          1.5825          1.5825
('poisonous', 'flat')     0        0       -1.9459  -1.9459   -0.3248   -0.3248    0.1041    0.1041      0.5668      0.5668        0             0               0               0                0           0      1           1             -1.9459        -1.9459         -0.2026         -0.2026
('poisonous', 'sunken')   0        0       -1.9459  -1.9459   -0.3248   -0.3248    0.1041    0.1041      0.5668      0.5668        0             0               0               0                0           0      1           1             -1.9459        -1.9459         -0.2026         -0.2026

Related Projects

https://github.com/pafoster/pyitlib

Griffith, Daniel M.; Veech, Joseph A.; and Marsh, Charles J. (2016) cooccur: Probabilistic Species Co-Occurrence Analysis in R. Journal of Statistical Software (69). doi: 10.18627/jss.v069.c02

Community guidelines

While the author can be contacted directly for support, it is recommended that third parties use GitHub standard features, such as issues and pull requests, to contribute, report problems, or seek support.

Author and citation

The library is developed by Tiago Tresoldi (tresoldi@shh.mpg.de).

The author has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. ERC Grant #715618, Computer-Assisted Language Comparison.

If you use catcoocc, please cite it as:

Tresoldi, Tiago (2020). catcoocc, a library for symmetric and asymmetric analysis of categorical co-occurrences. Version 0.1. Jena. Available at: https://github.com/tresoldi/catcoocc

In BibTeX:

@misc{Tresoldi2020catcoocc,
  author = {Tresoldi, Tiago},
  title = {catcoocc, a library for symmetric and asymmetric analysis of categorical co-occurrences. Version 0.1.},
  howpublished = {\url{https://github.com/tresoldi/catcoocc}},
  address = {Jena},
  year = {2020},
}

Project details


Release history Release notifications

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for catcoocc, version 0.1
Filename, size File type Python version Upload date Hashes
Filename, size catcoocc-0.1-py3-none-any.whl (4.6 MB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size catcoocc-0.1.tar.gz (4.6 MB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page