Skip to main content

Identify significant connections between biological processes using gene interaction networks.

Project description

Python Biological Process Networks (PyBPN) provides programs to detect connections between biological processes (called “links”) based on gene interaction, expression, and annotation data. A collection of significant links and the participating processes forms a biological process network, or BPN.

PyBPN provides three related programs for finding BPNs, each with different objectives:

bpln

Determines if processes are generally connected; an implementation of the algorithm described by Dotan-Cohen et al. [1].

cbpn

Determines whether, under a particular comparison of conditions, connections between processes are perturbed; an implementation of the algorithm described by Lasher et al. [2].

mcmcbpn

Similar to cbpn, but attempts to discover the smallest set of connections which describes as much of the perturbation of interacting genes as possible.

Availability

PyBPN releases are available from the Python Package Index (PyPI) at http://pypi.python.org/pypi/BiologicalProcessNetworks

PyBPN’s source code is hosted on GitHub at https://github.com/gotgenes/BiologicalProcessNetworks

Installation

The recommended way to install PyBPN is through the Python package installer pip, as it helps automagically manage dependencies, however, this document also provides instructions for manual installation.

PyBPN has several third-party dependencies, described below.

Dependencies

PyBPN depends on the following Python versions and external Python Packages (all available from PyPI):

If you are installing PyBPN via pip, you only need to ensure that you have an appropriate version of Python installed on your system. If you are manually installing PyBPN, you will need to obtain and install all dependencies through your own means (e.g., via apt, yum, .dmg installs, or from source, following the package’s instructions).

Installation by pip

pip will download and install PyBPN, as well as any Python package dependencies that are not yet installed on your system or which require upgrading.

System-wide installation for users with administrative access

If you have administrative (e.g., sudo) access on your system, you may install PyBPN system-wide with

sudo pip install BiologicalProcessNetworks

If you have not installed NumPy before hand, you may encounter an error [3]. In this case, try

pip install numpy
pip install BiologicalProcessNetworks

Local installation for non-privileged users

If you do not have administrative, or do not wish to make a system-wide installation of PyBPN, you can still install PyBPN and all its dependencies using the user site-packages installation.

pip install --user BiologicalProcessNetworks

If you have not installed NumPy before hand, you may encounter an error [3]. In this case, try

pip install --user numpy
pip install --user BiologicalProcessNetworks

Manual Installation

Once you have installed all dependencies and have obtained and unpacked the source for PyBPN (e.g., by using tar), move into the top level directory of the unpacked source and run

python setup.py install

If you do not have administrative permissions for your computer, you can install into the user-specific site-packages location with

python setup.py install --user

Usage

All programs accept the -h/--help option. Provide this option to get a full usage string from the program, including all available options. Below is a summary of the usage for each program and details of common options.

BPLN

TODO

CBPLN

TODO

MCMCBPN

mcmcbpn calculates a BPN which explains as much gene expression perturbation an underlying gene-gene (or protein-protein) response network as possible, using as few process-process links as possible. mcmcbpn performs Markov chain Monte Carlo (MCMC) in order to effectively consider all possible links simultaneously and select an optimal subset of them.

Basic Usage

The basic usage of mcmcbpn is as follows:

mcmcbpn [OPTIONS] INTERACTIONS_FILE ANNOTATIONS_FILE EXPRESSION_FILE

Each of the files is described below:

  • INTERACTIONS_FILE: a CSV file containing interactions. The file should have two columns with headings “interactor1” and “interactor2”. It may have an optional column with the heading “weight”, whose values will be used as the weight or confidence of the interaction. The file may have additional columns, which will be ignored.

  • ANNOTATIONS_FILE: a file containing annotations. The annotations file may be in one of two formats:

    • GMT format: if the file ends with the extension “.gmt”, it is automatically parsed as a GMT-format file. The file is a tab-separated (TSV) format with no headers. The first column contains the annotation term. The second column contains a description. All following columns contain gene IDs for genes annotated by that term. Full GMT format specification is available from the MSigDB and GSEA website.

    • Two-column format: The file should have a column titled “gene_id” which has the gene/gene product ID, and a column titled “term” which contains the term with which the gene/product is annotated. The file may have additional columns, which will be ignored.

  • EXPRESSION_FILE: a CSV file of gene (or gene product) expression values. The file should have a column titled “id” which has the gene (or gene product) ID, and a column titled “expression” which gives a value for the expression level, or difference in expression levels.

mcmcbpn has a large number of options which can change its behavior, either in terms of the algorithm and parameters used, or in terms of its output. To get a full list of options, run

mcmcbpn --help

Below are the most important options.

Algorithm and Parameter Options

These are options which affect the algorithmic behavior or starting state of mcmcbpn.

  • --burn-in=BURN_IN: the number of steps to take before recording states in the Markov chain [default: 1000000]

  • --steps=STEPS: the number of steps through the Markov chain to observe [default: 10000000]

  • --activity-threshold=ACTIVITY_THRESHOLD: set the (differential) expression threshold at which a gene is considered active [default: -log10(0.05)]

  • --transition-ratio=TRANSITION_RATIO: The target ratio of proposed link transitions to proposed parameter transitions [default: 0.9]

  • --fixed-distributions: use fixed distributions for link (and term) prior [implies --free-parameters] (highly recommended)

  • --free-parameters: parameters will be adjusted randomly, rather than incrementally (recommended)

  • --disable-swaps: disables swapping links as an option for transitions (highly recommended; will become the default option in future releases)

Output Options

These are options which affect the output file paths and file formats for mcmcbpn.

  • --links-outfile=LINKS_OUTFILE: the file to which the links results should be written [default: links_results.tsv]

  • --parameters-outfile=PARAMETERS_OUTFILE: the file to which the parameters results should be written [default: parameter_results.tsv]

  • --terms-outfile=TERMS_OUTFILE: the file to which the terms results should be written [default: terms_results.tsv]

  • --transitions-outfile=TRANSITIONS_OUTFILE: the file to which the transitions data should be written [default: transitions.tsv]

  • --detailed-transitions: transitions file includes full information about each step’s state (see also --bzip2 below, as this can drastically increase the file size of the transitions outfile)

  • --bzip2: compress transitions file using bzip2 (highly recommended, the transitions file can consume a large amount of disk space, in proportion to the number of steps)

  • --record-frequencies: record the frequency of each state

  • --frequencies-outfile=FREQUENCIES_OUTFILE: the file to which frequency information should be written [default: state_frequencies.tsv]

  • --logfile=LOGFILE: the file to which information for the run will be logged [default: mcmcbpn-TIMESTAMP.log]

Output

The two principal files output by mcmcbpn are the links outfile and the parameters outfile.

Links File

This TSV file contains three columns: term1, term2, and probability. term1 and term2 represent the two biological processes of a given link, and probability represents the probability that link should exist in the final biological process network (BPN) as determined by a given run of mcmcbpn.

Parameters File

This TSV file contains three columns: the first column, parameter, represents the name of the given parameter. Names include the following:

  • link_false_neg: proportion of interactions not explained by the BPN that should be

  • link_false_pos: propotion of interactions explained by the BPN that should not be

  • link_prior: the prior probability a link would be included in the BPN at all

The second column, value, shows a particular value for a given parameter. The third column, probability, gives the estimated probability that the given parameter should assume the respective value in order to maximize the likelihood of the BPN.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

BiologicalProcessNetworks-1.0a5.zip (110.3 kB view hashes)

Uploaded Source

BiologicalProcessNetworks-1.0a5.tar.gz (87.7 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page