Skip to main content

Compute diffusion scores over networks

Project description

https://github.com/multipaths/diffupath/blob/master/docs/source/meta/diffupath_logo.png

Introduction Build Status Documentation Status

DiffuPath is an analytic tool for biological networks that connects the generic label propagation algorithms from DiffuPy to biological networks encoded in several formats such as Simple Interaction Format (SIF) or Biological Expression Language (BEL). For example, in the application scenario presented in the paper, we use three pathway databases (i.e., KEGG, Reactome and WikiPathways) and their integrated network retrieved from PathMe [1] to analyze three multi-omics datasets. However, other biological networks can be imported from the Bio2BEL ecosystem [2].

Installation

The latest stable code can be installed from PyPI with:

$ python3 -m pip install diffupath

The most recent code can be installed from the source on GitHub with:

$ python3 -m pip install git+https://github.com/multipaths/diffupath.git

For developers, the repository can be cloned from GitHub and installed in editable mode with:

$ git clone https://github.com/multipaths/diffupath.git
$ cd diffupath
$ python3 -m pip install -e .

Requirements

diffupath requires the following libraries:

networkx (>=2.1)
pybel (0.13.2)
biokeen (0.0.14)
click (7.0)
tqdm (4.31.1)
numpy (1.16.3)
scipy (1.2.1)
scikit-learn (0.21.3)
pandas (0.24.2)
openpyxl (3.0.2)
plotly (4.5.3)
matplotlib (3.1.2)
matplotlib_venn (0.11.5)
bio2bel (0.2.1)
pathme
diffupy

Command Line Interface

The following commands can be used directly from your terminal:

  1. Download a database for network analysis.

The following command generates a BEL file representing the network of the given database.

$ python3 -m diffupath database get-database --database=<database-name>

To check the available databases, run the following command:

$ python3 -m diffupath database ls
  1. Run a diffusion analysis

The following command will run a diffusion method on a given network with the given data

$ python3 -m diffupath diffusion diffuse --network=<path-to-network-file> --data=<path-to-data-file> --method=<method>
  1. Run a diffusion analysis

$ python3 -m diffupath diffusion evaluate -i=<input_data> -n=<path_network>

Input Data

You can submit your dataset in any of the following formats:

  • CSV (.csv)

  • TSV (.tsv)

Please ensure that the dataset has a column ‘Node’ containing node IDs. If you only provide the node IDs, you can also include a column in your dataset ‘NodeType’ indicating the entity type for each node. You can also optionally add the following columns to your dataset:

  • LogFC [*]

  • p-value

Input dataset examples

DiffuPath accepts several input formats which can be codified in different ways. See the diffusion scores summary for more details.

1. You can provide a dataset with a column ‘Node’ containing node IDs along with a column ‘NodeType’ indicating the entity type.

Node

NodeType

A

Gene

B

Gene

C

Metabolite

D

Gene

2. You can also choose to provide a dataset with a column ‘Node’ containing node IDs as well as a column ‘logFC’ with their log 2 FC.

Node

LogFC

Gene A

4

Gene B

-1

Metabolite C

1.5

Gene D

3

3. Finally, you can provide a dataset with a column ‘Node’ containing node IDs, a column ‘logFC’ with their log 2 FC and a column ‘p-value’ with adjusted p-values.

Node

LogFC

p-value

Gene A

4

0.03

Gene B

-1

0.05

Metabolite C

1.5

0.001

Gene D

3

0.07

You can also take a look at our sample datasets folder for some examples files.

Networks

In this section, we describe the types of networks you can select to run diffusion methods over. These include the following and are described in detail in this section []:

  • Select a network representing an individual biological database

  • Select multiple databases to generate a harmonized network

  • Select from one of four predefined collections of biological databases representing a harmonized network

  • Submit your own network [] from one of the accepted formats

Network Dumps

Because of the high computational cost of generating the kernel, we provide links to pre-calculated kernels for a set of networks representing biological databases.

Database

Description

Reference

Download

DDR

Disease-disease associations

[3]

ddr.json

DrugBank

Drug and drug target interactions

[4]

drugbank.json

Gene Ontology

Hierarchy of tens of thousands of biological processes

[5]

go.json

HSDN

Associations between diseases and symptoms

[6]

hsdn.json

KEGG

Multi-omics interactions in biological pathways

[7]

kegg.json

miRTarBase

Interactions between miRNA and their targets

[8]

mirtarbase.json

Reactome

Multi-omics interactions in biological pathways

[9]

reactome.json

SIDER

Associations between drugs and side effects

[10]

sider.json

WikiPathways

Multi-omics interactions in biological pathways

[11]

wikipathways.json

If you would like to use one of our predefined collections, you can similarly download pre-calculated kernels for sets of networks representing integrated biological databases.

Collection

Database

Description

Download

#1

KEGG, Reactome and WikiPathways

-omics and biological processes/pathways

pathme.json

#2

KEGG, Reactome, WikiPathways and DrugBank

-omics and biological processes/pathways with a strong focus on drug/chemical interactions

pathme_drugbank.json

#3

KEGG, Reactome, WikiPathways and MirTarBase

-omics and biological processes/ pathways enriched with miRNAs

pathme_mirtarbase.json

Custom-network formats

You can also submit your own networks in any of the following formats:

  • BEL (.bel)

  • CSV (.csv)

  • Edge list (.lst)

  • GML (.gml or .xml)

  • GraphML (.graphml or .xml)

  • Pickle (.pickle)

  • TSV (.tsv)

  • TXT (.txt)

Minimally, please ensure each of the following columns are included in the network file you submit:

  • Source

  • Target

Optionally, you can choose to add a third column, “Relation” in your network (as in the example below). If the relation between the Source and Target nodes is omitted, and/or if the directionality is ambiguous, either node can be assigned as the Source or Target.

Custom-network example

Source

Target

Relation

Gene A

Gene B

Increase

Gene B

Metabolite C

Association

Gene A

Pathology D

Association

You can also take a look at our sample networks folder for some examples.

Disclaimer

DiffuPath is a scientific software that has been developed in an academic capacity, and thus comes with no warranty or guarantee of maintenance, support, or back-up of data.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diffupath-0.0.2.tar.gz (47.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page