Skip to main content

CoCoScore: context-aware co-occurrence scores for text mining applications

Project description

https://github.com/JungeAlexander/cocoscore/blob/master/doc/logos/CoCoScore-text-small.png

Text mining of the biomedical literature has been successful in retrieving interactions between proteins, non-coding RNAs, and chemicals as well as in determining tissue-specific expression and subcellular localization. Simple co-occurrence-based scoring schemes can uncover such associations by finding entity pairs that are frequently mentioned together but ignore the textual context of each co-occurrence.

CoCoScore implements an improved context-aware co-occurrence scoring scheme that uses textual context to assess whether an association is described in a given sentence or not. CoCoScore achieves superior performance compared to previous approaches that rely on constant sentence scores, based on datasets of disease-gene, tissue-gene, and protein-protein associations. In our research, we use distant supervision to create an automatic, but noisy, labelling of a large dataset of sentences co-mentioning two entities of interest.

Free software: MIT license

Installation

To install CoCoScore via bioconda (for Linux and Mac OS):

conda install -c bioconda cocoscore

To install CoCoScore via pip:

pip install cocoscore

CoCoScore depends on fastText which needs to be installed separately if CoCoScore was installed via pip. The installation via bioconda automatically installs fastText, too.

If you installed you installed CoCoScore via pip, please build v0.1.0 of fastText as described here and make sure the fasttext binary is discoverable via your $PATH environment variable.

fastText v0.1.0 is also available via conda-forge:

conda install -c conda-forge fasttext=0.1.0

CoCoScore docker container:

Bioconda automatically builds a Docker container for CoCoScore. See the package documentation for more information.

Quick start

  1. Follow the installation instructions above.

  2. Download the demo.ftz file (see next section) needed to run through the example.

  3. Run through the example to learn how to apply CoCoScore to your own data.

Example usage

Before running the examples, please download the following file and save it to doc/example/:

The files are downloaded and placed in the correct directories by executing:

wget -P doc/example/ http://download.jensenlab.org/BLAH4/demo.ftz

Preprint manuscript

A preprint manuscript describing CoCoScore and its performance on eight datasets, compared to a baseline co-occurrence scoring model, is available via bioRxiv.

Supplementary data described in the manuscript can be downloaded via figshare.

Contributors

CoCoScore is being developed by Alexander Junge and Lars Juhl Jensen at the Disease Systems Biology Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark.

Feedback

Please open an issue here or write us: {alexander.junge,lars.juhl.jensen} AT cpr DOT ku DOT dk

See also: https://github.com/JungeAlexander/cocoscore/blob/master/CONTRIBUTING.rst

Development

To run the all tests run:

tox

Note, to combine the coverage data from all the tox environments run:

Windows

set PYTEST_ADDOPTS=--cov-append
tox

Other

PYTEST_ADDOPTS=--cov-append tox

Changelog

1.0.0 (2010-01-25)

  • implement new sentence score cutoff in tagger.co_occurrence_score.co_occurrence_score()

  • remove gensim depency, retain only scipy as depedency

  • document/facilitate usage of different scoring model

  • fix issues with Travis CI runs

0.2.0 (2018-11-10)

  • Add support for Python 3.5.

0.1.0 (2018-11-10)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoscore-1.0.0.tar.gz (389.3 kB view details)

Uploaded Source

Built Distribution

cocoscore-1.0.0-py2.py3-none-any.whl (32.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file cocoscore-1.0.0.tar.gz.

File metadata

  • Download URL: cocoscore-1.0.0.tar.gz
  • Upload date:
  • Size: 389.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.1

File hashes

Hashes for cocoscore-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a19bd40c09214225bd7bcb1ac4bc375825da81426aaa50106eff62203abd9f8a
MD5 4488415b1e04251db780b4b35df6ffd6
BLAKE2b-256 a452aa082cbf0f04d115723d21ee238c6f2d3e45878d5d500be29afa23cef06b

See more details on using hashes here.

File details

Details for the file cocoscore-1.0.0-py2.py3-none-any.whl.

File metadata

  • Download URL: cocoscore-1.0.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 32.5 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.1

File hashes

Hashes for cocoscore-1.0.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 2d6214ad3e013f8156f9b573644a0395444dda8352ee7c8d41b85e85918d2e66
MD5 b5e85b7398e52c6c60373c1bafad2549
BLAKE2b-256 32ec21217d76974721c7842195db39a262a1ccf8f14cae2b71c7ea294f282742

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page