A bioinformatic pipeline for proteome annotation to predict if a protein is exposed on the surface of a bacteria.
inmembrane is a pipeline for proteome annotation to predict if a protein is exposed on the surface of a bacteria. It orchestrates the analysis of protein sequences to provides a summary of which targets may be surface exposed based on predicted subcellular localization signals and membrane topology. Currently protocols have been implemented for gram+ and gram- bacterial proteomes.
Typical usage is via the script inmembrane_scan, eg:
$ inmembrane_scan mysequences.fasta
The provided sequences (in FASTA format) are subjected to an number of sequence analyses using external programs (see below) and the result summarized like:
SPy_0008 CYTOPLASM(non-PSE) . SPy_0008 from AE004092 SPy_0010 PSE-Membrane tmhmm(1) SPy_0010 from AE004092 SPy_0012 PSE-Cellwall hmm(GW2|GW3|GW1);signalp SPy_0012 from AE004092 SPy_0016 MEMBRANE(non-PSE) tmhmm(12) SPy_0016 from AE004092 SPy_0019 SECRETED signalp SPy_0019 from AE004092
As well as output to stdout, this will generate a summary CSV file mysequences.csv``and a directory ``mysequences containing output files generated during the run.
Although inmembrane is primarily designed to be used as a stand alone program, it can also be used as a library like:
import inmembrane params = inmembrane.get_params() params['fasta'] = "input.fasta" annotations = inmembrane.process(params)
where annotations is a dictionary of the results, with protein sequence IDs as keys.
You can also test the functionality of the analysis plugins that are part of inmembrane by typing:
$ inmembrane_scan --test
This can be useful for determining which binary dependences are correctly installed, or exposing any broken / offline web services required for a particular analysis.
The latest stable release of inmembrane can be installed via pip, or the bleeding edge from Github.
$ sudo pip install inmembrane
Or from Github:
$ git clone http://github.com/boscoh/inmembrane.git $ cd inmembrane $ sudo python setup.py install
The package includes tests, examples, data files, docs. HMMER3 is the only required external dependency, however for large analyses (multiple proteomes) it is suggested that local versions of other analysis programs are installed rather than relying on web services (see Installing dependencies below).
The editable parameters of inmembrane are found in inmembrane.config, which is always located in the same directory as the main script. If no such file exists, a default inmembrane.config will be generated. By default, you probably don’t need to change anything.
The parameters are:
The output of inmembrane gram_pos protocol consists of four columns of output. This is printed to stdout and written as a CSV file, which can be opened in spreadsheet software such as EXCEL. The standard text output can be parsed using space delimiters (empty fields in the third column are indicated with a “.”). Logging information are prefaced by a ‘#’ character, and is sent to stderr.
Here’s an example:
SPy_0008 CYTOPLASM(non-PSE) . SPy_0008 from AE004092 SPy_0009 CYTOPLASM(non-PSE) . SPy_0009 from AE004092 SPy_0010 PSE-Membrane tmhmm(1) SPy_0010 from AE004092 SPy_0012 PSE-Cellwall hmm(GW2|GW3|GW1);signalp SPy_0012 from AE004092 SPy_0013 PSE-Membrane tmhmm(1) SPy_0013 from AE004092 SPy_0015 PSE-Membrane tmhmm(2) SPy_0015 from AE004092 SPy_0016 MEMBRANE(non-PSE) tmhmm(12) SPy_0016 from AE004092 SPy_0019 SECRETED signalp SPy_0019 from AE004092
While inmembrane only requires a local installation of HMMER 3.0 and can used web services for TMHMM, SignalP, LipoP and various OMP beta-barrel predictors, for large scale analyses (5000 sequences+) it is suggested that locally installed versions are used in the interest of speed, at to be polite to publically available web services.
With each dependency, it is important that you have the exact version that inmembrane is written to interoperate with, otherwise inmembrane is likely to be unable to interpret the output of the downstream analysis program.
Required dependencies, and their versions:
These instructions have been tailored for Debian-based systems, in particular Ubuntu 11.10+. Each of these dependencies are licensed free to academic users.
On Ubuntu (and other Debian-derived) Linux distributions:
$ sudo apt-get install hmmer
should be enough.
Only one of TMHMM or MEMSAT3 are required, but users that want to compare transmembrane segment predictions can install both.
(Note the the ‘runmemsat’ script refers to PSIPRED v2, but it means MEMSAT3 - PSIPRED is NOT required).
It is a fact of life for bioinformatics that new versions of basic tools changes output formats and API. We believe that it is an essential skill to rewrite parsers to handle the subtle but significant changes in different versions. We have written inmembrane to be easily modifiable and extensible. Protocols which embody a particular high level workflow are found in inmembrane/protocols.
All interaction with a specific external programs or web services have been wrapped into a single python plugin module, and placed in the inmembrane/plugins directory. This contains the code to both run the program and to parse the output. We have tried to make the parsing code as concise as possible. Specifically, by using the native Python dictionary, which allows an enormous amout of flexibility, we can collate the results of various analyses with very little code.
A more comprehensive overview can be found at http://boscoh.github.com/inmembrane/api.html.
Here are some guidelines for understanding and extending the code.