Loads NCI-PID data into NDEx
Project description
NDEx NCI-PID content loader
Python application that loads NCI-PID data into NDEx
This tool downloads OWL files containing NCI-PID data from: ftp://ftp.ndexbio.org/NCI_PID_BIOPAX_2016-06-08-PC2v8-API/ and performs the following operations:
1) OWL files are converted to extended SIF format using Paxtools and the SIF file is loaded into a network
- 2) A node attribute named type is added to each node and is set to one of the following
by extracting its value from PARTICIPANT_TYPE column in SIF file:
protein (originally ProteinReference)
smallmolecule (originally SmallMoleculeReference)
proteinfamily (set if node name has family and was a protein)
RnaReference (original value)
ProteinReference;SmallMoleculeReference (original value)
3) A node attribute named alias is added to each node and is loaded from UNIFICATION_XREF column in SIF file which is split by ; into a list. Each element of this list is prefixed with uniprot: and t first element is set as the represents value in node and removed from the alias attribute. If after removal, the alias attribute value is empty, it is removed.
4) In SIF file INTERACTION_TYPE defines edge interaction type and INTERACTION_PUBMED_ID define value of citation edge attribute. The values in citation edge attribute are prefixed with pubmed: Once loaded redundant edges are removed following these conventions:
neighbor-of edges are removed
controls-state-of edges are removed if another edge connecting same nodes has one of the following interactions: controls-state-change-of, controls-transport-of, controls-phosphorylation-of, controls-expression-of
NOTE: If above results in orphaned nodes, those nodes are removed as well
5) An edge attribute named directed is set to True if edge interaction type is one of the following (otherwise its set to False)
controls-state-change-of
controls-transport-of
controls-phosphorylation-of
controls-expression-of
catalysis-precedes
controls-production-of
controls-transport-of-chemical
chemical-affects
used-to-produce
6) If node name matches represents value in node (with uniprot: prefix added) then the node name is replaced with gene symbol from gene_symbol_mapping.json
7) If node name starts with CHEBI then node name is replaced with value of PARTICIPANT_NAME from SIF column
8) If node represents value starts with chebi:CHEBI the chebi: is removed
9) If _HUMAN in SIF file PARTICIPANT_NAME column for a given node then this value is replaced by doing a lookup in gene_symbol_mapping.json, unless value in lookup is - in which case original name is left
10) Any node with family node name is changed as follows if a lookup of node name against gene_symbol_mapping.json returns one or more genes
Node attribute named member is added and set to list of genes found in lookup in gene_symbol_mapping.json
Node attribute named type is changed to proteinfamily
- 11) Changed in 5.0.0. For each network all proteinfamily nodes are examined and if any members exist
as separate nodes, those nodes are removed and their edges are shifted to the corresponding proteinfamily node. Duplicate edges are removed and other edges are merged if interaction and directed values are the same. In the case of a merge citation field values are merged into a new unique list.
12) The following network attributes are set
name set to name of OWL file with .owl.gz suffix removed except for PathwayCommons.8.NCI_PID.BIOPAX which is renamed to NCI PID - Complete Interactions
author (from Curated By column in networkattributes.tsv)
labels (from PID column in networkattributes.tsv)
organism is pulled from organism attribute of style.cx
prov:wasGeneratedBy is set to html link to this repo with text ndexncipidloader <VERSION> (example: ndexncipidloader 1.2.0)
prov:wasDerivedFrom is set to full path to OWL file on ftp site
reviewers (from Reviewed By column in networkattributes.tsv)
version is set to Abbreviated month-year (example: MAY-2019)
description is pulled from description attribute of style.cx except for NCI PID - Complete Interactions which has a hardcoded description set to This network includes all interactions of the individual NCI-PID pathways.
networkType is set to list of string with single entry pathway except for NCI PID - Complete Interactions which also includes interactome
__iconurl is set to value of –iconurl flag (currently defaulting to http://search.ndexbio.org/static/media/ndex-logo.04d7bf44.svg)
__normalizationversion is set to 0.1
13) By default each network is made public with full indexed and showcased (visible in user’s home network list page)
NOTE: gene_symbol_mapping.json was originally extracted from here but the gene families were updated by calling ndexloadncipid.py –getfamilies sifdir/ which calls https://mygene.info via biothings Python client
Dependencies
Compatibility
Python 3.6+
Installation
git clone https://github.com/ndexcontent/ndexncipidloader
cd ndexncipidloader
make dist
pip install dist/ndexncipidloader*whl
Configuration
The ndexloadncipid.py requires a configuration file in the following format be created.
The default path for this configuration is ~/.ndexutils.conf
but can be overridden with
--conf
flag.
Format of configuration file
[<value in --profile (default ndexncipidloader)>]
user = <NDEx username>
password = <NDEx password>
server = <NDEx server(omit http) ie public.ndexbio.org>
Example configuration file
[ncipid_dev]
user = joe123
password = somepassword123
server = dev.ndexbio.org
Required external tool
Paxtools is needed to convert the OWL files to SIF format.
Please download paxtools.jar (http://www.biopax.org/Paxtools/) (requires Java 8+) and put in current working directory
Or specify path to paxtools.jar with --paxtools
flag on
loadnexncipidloader.py
Usage
For more information invoke ndexloadncipid.py -h
Example usage
This example assumes a valid configuration file with paxtools.jar in the working directory.
ndexloadncipid.py sif
Example usage with sif files already downloaded
This example assumes a valid configuration file and the SIF files are located in sif/
directory
ndexloadncipid.py --skipdownload sif
Credits
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
History
5.0.1 (2021-05-25)
Switched default layout (can be overridden with –layout flag) to force-directed since force-directed-cl may not work on all machines.
5.0.0 (2021-05-20)
Fixed duplicate node issue by removing nodes and edges from a network if a family node, contains the node in its memberlist. Any edges are shifted to the family node with duplicates merged where possible.
4.0.0 (2020-11-04)
New default behavior: force-directed-cl layout is now applied on networks via py4cytoscape library and a running instance of Cytoscape. Alternate Cytoscape layouts and the networkx “spring” layout can be run by setting appropriate value via the new –layout flag
3.1.1 (2020-10-16)
Removed NODE_LABEL_POSITION discrete mapping from style since it is not compatible with CX 2.0
3.1.0 (2019-09-11)
Added –disableshowcase flag that lets caller disable showcasing of NEWLY added networks which is enabled by default.
Added –indexlevel flag that lets caller set type of indexing performed on NEWLY added networks. Default is full indexing (all).
3.0.0 (2019-08-02)
Renamed command line tool from loadndexncipidloader.py to ndexloadncipid.py to be more consistent with other loaders. Since this is a breaking change bumped to version 3.0.0
Added –visibility flag which lets caller dictate whether newly added networks are set to PUBLIC (default) or PRIVATE
Removed parameter –disablcitededgemerge since the changes in 2.0.0 causes this to no longer have any effect
Set default for –paxtools flag to be paxtools.jar which assumes the tool is in current working directory
2.0.0 (2019-07-16)
Spring layout adjusted by increasing iterations
Code now removes all neighbor-of edges with NO data migration. controls-state-change-of edges are removed if more informative edges exist. Any orphaned nodes resulting from the removal of these edges are also removed
1.6.0 (2019-07-09)
Added __iconurl network attribute to all networks
Added interactome to networkType* network attribute for ‘NCI PID - Complete Interactions’ network
1.5.1 (2019-07-09)
Renamed network attribute type to networkType to adhere to normalization specification
1.5.0 (2019-06-28)
Fixed style.cx by removing view aspects that was causing networks to not render properly in cytoscape
1.4.0 (2019-06-13)
Network PathwayCommons.8.NCI_PID.BIOPAX is now renamed to ‘NCI PID - Complete Interactions’ with alternate description.
1.3.0 (2019-06-12)
Improved description in style.cx file (JIRA ticket UD-362)
1.2.0 (2019-06-11)
Code now adds a citation attribute to every edge even if there is no value in which case an empty list is set (JIRA ticket UD-360)
Added type network attribute and set it to [‘pathway’] following normalization guidelines
1.1.0 (2019-06-10)
Adjusted network layout to be more compact by reducing number of iterations in spring layout algorithm as well as lowering the value of scale (JIRA ticket UD-360)
1.0.2 (2019-05-24)
Removed view references from cyVisualProperties aspect of style.cx file cause it was causing issues with loading in cytoscape
Set directed edge attribute type to boolean cause it was incorrectly defaulting to a string
1.0.1 (2019-05-18)
Renamed incorrect attribute name prov:wasDerivedBy to prov:wasDerivedFrom to adhere to normalization document requirements
1.0.0 (2019-05-16)
Massive refactoring and first release where code attempts to behave as defined in README.rst
0.1.1 (2019-02-15)
Updated data/style.cx by renaming Protein to protein and SmallMolecule to smallmolecule to match the new normalization conventions
0.1.0 (2019-02-15)
First release
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ndexncipidloader-5.0.1.tar.gz
.
File metadata
- Download URL: ndexncipidloader-5.0.1.tar.gz
- Upload date:
- Size: 133.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fb9ce7033ca2fdc5f2388d7d1390afc928e083512050069b59c9948f0af2a44b |
|
MD5 | fd8097c2d0191f6a8f72f5c93d7700ef |
|
BLAKE2b-256 | be2bf248eda88a10bfc5c1a83ddf00ca0c13a3653f083e68fd49c527c673ea45 |
File details
Details for the file ndexncipidloader-5.0.1-py2.py3-none-any.whl
.
File metadata
- Download URL: ndexncipidloader-5.0.1-py2.py3-none-any.whl
- Upload date:
- Size: 127.5 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0ca01fde81b6541c3ca12f261c9c681cf0a4857116a05c3c194a77ae5cb312a3 |
|
MD5 | 5898952ad678ecf6c155249cfc8261a7 |
|
BLAKE2b-256 | fd66acc490227ee062df42b8800030df342ee30dfb1ca601763e6177c35fe967 |