Identify significant connections between biological processes using gene interaction networks.
Python Biological Process Networks (PyBPN) provides programs to detect connections between biological processes (called “links”) based on gene interaction, expression, and annotation data. A collection of significant links and the participating processes forms a biological process network, or BPN.
PyBPN provides three related programs for finding BPNs, each with different objectives:
- Determines if processes are generally connected; an implementation of the algorithm described by Dotan-Cohen et al. .
- Determines whether, under a particular comparison of conditions, connections between processes are perturbed; an implementation of the algorithm described by Lasher et al. .
- Similar to cbpn, but attempts to discover the smallest set of connections which describes as much of the perturbation of interacting genes as possible.
PyBPN releases are available from the Python Package Index (PyPI) at http://pypi.python.org/pypi/BiologicalProcessNetworks
PyBPN’s source code is hosted on GitHub at https://github.com/gotgenes/BiologicalProcessNetworks
The recommended way to install PyBPN is through the Python package installer pip, as it helps automagically manage dependencies, however, this document also provides instructions for manual installation.
PyBPN has several third-party dependencies, described below.
PyBPN depends on the following Python versions and external Python Packages (all available from PyPI):
- Python 2.6 or 2.7. Python 3 is not currently supported; Python 2.5 and lower are unsupported. Check your Python version with python --version. Obtain newer releases of Python from http://python.org/download/
- ConvUtils (v.1.1; currently v.2.0 is unsupported)
- NetworkX (v.1.0 or greater)
- SciPy (which depends on NumPy)
If you are installing PyBPN via pip, you only need to ensure that you have an appropriate version of Python installed on your system. If you are manually installing PyBPN, you will need to obtain and install all dependencies through your own means (e.g., via apt, yum, .dmg installs, or from source, following the package’s instructions).
Installation by pip
pip will download and install PyBPN, as well as any Python package dependencies that are not yet installed on your system or which require upgrading.
System-wide installation for users with administrative access
If you have administrative (e.g., sudo) access on your system, you may install PyBPN system-wide with
sudo pip install BiologicalProcessNetworks
If you have not installed NumPy before hand, you may encounter an error . In this case, try
pip install numpy pip install BiologicalProcessNetworks
Local installation for non-privileged users
If you do not have administrative, or do not wish to make a system-wide installation of PyBPN, you can still install PyBPN and all its dependencies using the user site-packages installation.
pip install --user BiologicalProcessNetworks
If you have not installed NumPy before hand, you may encounter an error . In this case, try
pip install --user numpy pip install --user BiologicalProcessNetworks
Once you have installed all dependencies and have obtained and unpacked the source for PyBPN (e.g., by using tar), move into the top level directory of the unpacked source and run
python setup.py install
If you do not have administrative permissions for your computer, you can install into the user-specific site-packages location with
python setup.py install --user
All programs accept the -h/--help option. Provide this option to get a full usage string from the program, including all available options. Below is a summary of the usage for each program and details of common options.
mcmcbpn calculates a BPN which explains as much gene expression perturbation an underlying gene-gene (or protein-protein) response network as possible, using as few process-process links as possible. mcmcbpn performs Markov chain Monte Carlo (MCMC) in order to effectively consider all possible links simultaneously and select an optimal subset of them.
The basic usage of mcmcbpn is as follows:
mcmcbpn [OPTIONS] INTERACTIONS_FILE ANNOTATIONS_FILE EXPRESSION_FILE
Each of the files is described below:
- INTERACTIONS_FILE: a CSV file containing interactions. The file should have two columns with headings “interactor1” and “interactor2”. It may have an optional column with the heading “weight”, whose values will be used as the weight or confidence of the interaction. The file may have additional columns, which will be ignored.
- ANNOTATIONS_FILE: a file containing annotations. The annotations
file may be in one of two formats:
- GMT format: if the file ends with the extension “.gmt”, it is automatically parsed as a GMT-format file. The file is a tab-separated (TSV) format with no headers. The first column contains the annotation term. The second column contains a description. All following columns contain gene IDs for genes annotated by that term. Full GMT format specification is available from the MSigDB and GSEA website.
- Two-column format: The file should have a column titled “gene_id” which has the gene/gene product ID, and a column titled “term” which contains the term with which the gene/product is annotated. The file may have additional columns, which will be ignored.
- EXPRESSION_FILE: a CSV file of gene (or gene product) expression values. The file should have a column titled “id” which has the gene (or gene product) ID, and a column titled “expression” which gives a value for the expression level, or difference in expression levels.
mcmcbpn has a large number of options which can change its behavior, either in terms of the algorithm and parameters used, or in terms of its output. To get a full list of options, run
Below are the most important options.
Algorithm and Parameter Options
These are options which affect the algorithmic behavior or starting state of mcmcbpn.
- --burn-in=BURN_IN: the number of steps to take before recording states in the Markov chain [default: 1000000]
- --steps=STEPS: the number of steps through the Markov chain to observe [default: 10000000]
- --activity-threshold=ACTIVITY_THRESHOLD: set the (differential) expression threshold at which a gene is considered active [default: -log10(0.05)]
- --transition-ratio=TRANSITION_RATIO: The target ratio of proposed link transitions to proposed parameter transitions [default: 0.9]
- --fixed-distributions: use fixed distributions for link (and term) prior [implies --free-parameters] (highly recommended)
- --free-parameters: parameters will be adjusted randomly, rather than incrementally (recommended)
- --disable-swaps: disables swapping links as an option for transitions (highly recommended; will become the default option in future releases)
These are options which affect the output file paths and file formats for mcmcbpn.
- --links-outfile=LINKS_OUTFILE: the file to which the links results should be written [default: links_results.tsv]
- --parameters-outfile=PARAMETERS_OUTFILE: the file to which the parameters results should be written [default: parameter_results.tsv]
- --terms-outfile=TERMS_OUTFILE: the file to which the terms results should be written [default: terms_results.tsv]
- --transitions-outfile=TRANSITIONS_OUTFILE: the file to which the transitions data should be written [default: transitions.tsv]
- --detailed-transitions: transitions file includes full information about each step’s state (see also --bzip2 below, as this can drastically increase the file size of the transitions outfile)
- --bzip2: compress transitions file using bzip2 (highly recommended, the transitions file can consume a large amount of disk space, in proportion to the number of steps)
- --record-frequencies: record the frequency of each state
- --frequencies-outfile=FREQUENCIES_OUTFILE: the file to which frequency information should be written [default: state_frequencies.tsv]
- --logfile=LOGFILE: the file to which information for the run will be logged [default: mcmcbpn-TIMESTAMP.log]
The two principal files output by mcmcbpn are the links outfile and the parameters outfile.
- Links File
- This TSV file contains three columns: term1, term2, and probability. term1 and term2 represent the two biological processes of a given link, and probability represents the probability that link should exist in the final biological process network (BPN) as determined by a given run of mcmcbpn.
- Parameters File
This TSV file contains three columns: the first column, parameter, represents the name of the given parameter. Names include the following:
- link_false_neg: proportion of interactions not explained by the BPN that should be
- link_false_pos: propotion of interactions explained by the BPN that should not be
- link_prior: the prior probability a link would be included in the BPN at all
The second column, value, shows a particular value for a given parameter. The third column, probability, gives the estimated probability that the given parameter should assume the respective value in order to maximize the likelihood of the BPN.
|||Dotan-Cohen, D., *et al.* “Biological Process Linkage Networks.” PLoS One. 2009.|
|||Lasher, C., *et al.* “Discovering Networks of Perturbed Biological Processes in Hepatocyte Cultures.” PLoS One. 2010.|
|||(1, 2) If your install fails during the installation of SciPy, try running pip install numpy (or local-install equivalent) prior to installing PyBPN.|
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|File Name & Checksum SHA256 Checksum Help||Version||File Type||Upload Date|
|BiologicalProcessNetworks-1.0a5.tar.gz (87.7 kB) Copy SHA256 Checksum SHA256||–||Source||May 4, 2013|
|BiologicalProcessNetworks-1.0a5.zip (110.3 kB) Copy SHA256 Checksum SHA256||–||Source||May 4, 2013|