Skip to main content

Loads NCI-PID data into NDEx

Project description

NDEx NCI-PID content loader

https://img.shields.io/pypi/v/ndexncipidloader.svg https://img.shields.io/travis/ndexcontent/ndexncipidloader.svg https://coveralls.io/repos/github/ndexcontent/ndexncipidloader/badge.svg?branch=master Documentation Status

Python application that loads NCI-PID data into NDEx

This tool downloads OWL files containing NCI-PID data from: ftp://ftp.ndexbio.org/NCI_PID_BIOPAX_2016-06-08-PC2v8-API/ and performs the following operations:

1) OWL files are converted to extended SIF format using Paxtools and the SIF file is loaded into a network

2) A node attribute named type is added to each node and is set to one of the following

by extracting its value from PARTICIPANT_TYPE column in SIF file:

  • protein (originally ProteinReference)

  • smallmolecule (originally SmallMoleculeReference)

  • proteinfamily (set if node name has family and was a protein)

  • RnaReference (original value)

  • ProteinReference;SmallMoleculeReference (original value)

3) A node attribute named alias is added to each node and is loaded from UNIFICATION_XREF column in SIF file which is split by ; into a list. Each element of this list is prefixed with uniprot: and t first element is set as the represents value in node and removed from the alias attribute. If after removal, the alias attribute value is empty, it is removed.

4) In SIF file INTERACTION_TYPE defines edge interaction type and INTERACTION_PUBMED_ID define value of citation edge attribute. The values in citation edge attribute are prefixed with pubmed: Once loaded redundant edges are removed following these conventions:

  • neighbor-of edges are removed if they contain no unique citations and an edge of another type exists

  • controls-state-change-map edges are removed if they contain no unique citations and an edge of type other then neighbor-of exists

  • Special case: After network has been updated following previous two conditions and there exists a neighbor-of edge with citations and one other edge exists with no citations, the citations from neighbor-of are added to the other edge and the neighbor-of edge is removed

5) An edge attribute named directed is set to True if edge interaction type is one of the following (otherwise its set to False)

controls-state-change-of
controls-transport-of
controls-phosphorylation-of
controls-expression-of
catalysis-precedes
controls-production-of
controls-transport-of-chemical
chemical-affects
used-to-produce

6) If node name matches represents value in node (with uniprot: prefix added) then the node name is replaced with gene symbol from gene_symbol_mapping.json

7) If node name starts with CHEBI then node name is replaced with value of PARTICIPANT_NAME from SIF column

8) If node represents value starts with chebi:CHEBI the chebi: is removed

9) If _HUMAN in SIF file PARTICIPANT_NAME column for a given node then this value is replaced by doing a lookup in gene_symbol_mapping.json, unless value in lookup is - in which case original name is left

10) Any node with family node name is changed as follows if a lookup of node name against gene_symbol_mapping.json returns one or more genes

  • Node attribute named member is added and set to list of genes found in lookup in gene_symbol_mapping.json

  • Node attribute named type is changed to proteinfamily

11) The following network attributes are set

  • name set to name of OWL file with .owl.gz suffix removed

  • author (from Curated By column in networkattributes.tsv)

  • labels (from PID column in networkattributes.tsv)

  • organism is pulled from organism attribute of style.cx

  • prov:wasGeneratedBy is set to html link to this repo with text ndexncipidloader <VERSION> (example: ndexncipidloader 1.2.0)

  • prov:wasDerivedFrom is set to full path to OWL file on ftp site

  • reviewers (from Reviewed By column in networkattributes.tsv)

  • version is set to Abbreviated month-year (example: MAY-2019)

  • description is pulled from description attribute of style.cx

  • type is set to list of string with single entry pathway

  • __normalizationversion is set to 0.1

Dependencies

Compatibility

  • Python 3.3+

Installation

git clone https://github.com/coleslaw481/ndexncipidloader
cd ndexncipidloader
make dist
pip install dist/ndexncipidloader*whl

Configuration

The loadndexncipidloader.py requires a configuration file in the following format be created. The default path for this configuration is ~/.ndexutils.conf but can be overridden with --conf flag.

Format of configuration file

[<value in --profile (default ndexncipidloader)>]

user = <NDEx username>
password = <NDEx password>
server = <NDEx server(omit http) ie public.ndexbio.org>

Example configuration file

[ncipid_dev]

user = joe123
password = somepassword123
server = dev.ndexbio.org

Required external tool

Paxtools is needed to convert the OWL files to SIF format.

Please download paxtools.jar (http://www.biopax.org/Paxtools/) (requires Java 8+) and put in current working directory or specify path to paxtools.jar with –paxtools flag on loadnexncipidloader.py

Usage

For more information invoke loadndexncipidloader.py -h

Example usage

This example assumes a valid configuration file with paxtools.jar in the working directory.

loadndexncipidloader.py sif

Example usage with sif files already downloaded

This example assumes a valid configuration file and the SIF files are located in sif/ directory

loadndexncipidloader.py --skipdownload sif

Via Docker

Example usage

This example paxtools.jar is in current directory, and a configuration file has been created in current working directory and named conf

docker run -v `pwd`:`pwd` -w `pwd` coleslawndex/ndexncipidloader:1.0.0 loadndexncipidloader.py --paxtools `pwd`/paxtools.jar --conf conf sif

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

History

1.2.0 (2019-06-11)

  • Code now adds a citation attribute to every edge even if there is no value in which case an empty list is set (JIRA ticket UD-360)

  • Added type network attribute and set it to [‘pathway’] following normalization guidelines

1.1.0 (2019-06-10)

  • Adjusted network layout to be more compact by reducing number of iterations in spring layout algorithm as well as lowering the value of scale (JIRA ticket UD-360)

1.0.2 (2019-05-24)

  • Removed view references from cyVisualProperties aspect of style.cx file cause it was causing issues with loading in cytoscape

  • Set directed edge attribute type to boolean cause it was incorrectly defaulting to a string

1.0.1 (2019-05-18)

  • Renamed incorrect attribute name prov:wasDerivedBy to prov:wasDerivedFrom to adhere to normalization document requirements

1.0.0 (2019-05-16)

  • Massive refactoring and first release where code attempts to behave as defined in README.rst

0.1.1 (2019-02-15)

  • Updated data/style.cx by renaming Protein to protein and SmallMolecule to smallmolecule to match the new normalization conventions

0.1.0 (2019-02-15)

  • First release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ndexncipidloader-1.2.0.tar.gz (108.8 kB view hashes)

Uploaded Source

Built Distribution

ndexncipidloader-1.2.0-py2.py3-none-any.whl (114.0 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page