Bialignment of RNAs and proteins
Project description
BiAlign - Bialignment of RNAs and proteins
The tool BiAlign computes optimal bi-alignments of RNAs and proteins. Such bi-alignments support evolutionary 'shift' events between sequence and structure. In this way, bialignments extend alignments based on sequence and struture similarity to the case of potential incongruence between sequence and structure evolution.
The current version extends the capabilities from RNA alignments to the alignment of protein sequence and secondary structure, supporting realistic 'affine' gap cost with gap opening and extension scores.
The first version of this tool has been described in
Waldl M., Will S., Wolfinger M.T., Hofacker I.L., Stadler P.F. (2020) Bi-alignments as Models of Incongruent Evolution of RNA Sequence and Secondary Structure. In: Cazzaniga P., Besozzi D., Merelli I., Manzoni L. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2019. Lecture Notes in Computer Science, vol 12313. Springer, Cham. https://doi.org/10.1007/978-3-030-63061-4_15
Installation
This software will run with full functionality only on Linux and Mac systems. Installation via conda is not supported on Windows and the prediction of RNA structures (using the Vienna RNA package) cannot be supported.
The software can be installed via Conda (only Linux/Mac) or pip (Mac/Linux/Windows/...) respectively by
conda install -c bioconda bialign
or
pip install bialign
Conda installation is recommended, since it will automatically install dependencies like the Vienna RNA package. When installing via pip (or from source, see below), additionally install numpy, matplotlib, and (optionally) the Vienna RNA package.
Installation from source
Installation or from source, e.g. a clone of the git repository, relies on the python setup system.
We require Cython to compile performance critical code. For this purpose Cython and Python (including pip/setuptools) need to be installed. Install from source by
pip install .
Moreover, for aligning RNAs, the tool requires the Vienna RNA package with Python bindings. We recommend to use the tool under Linux or MacOS and install the prerequisites via conda / bioconda.
Usage examples
The tool can be used from the command line or via its Python interface (e.g. from a Jupyter notebook).
Command line interface
To get an overview on all command line parameters that configure modes and alignment parameters, please refer to the help output of the tool as obtained by
bialign.py --help
RNA bi-alignment examples
This 'toy' example demonstrates a simple helix shift:
bialign.py GCGGGGGAUAUCCCCAUCG GGGGAUAUCCCCAUCG \
--strA "...(((.....)))....." --strB ".(((.....)))...." \
--structure 400 \
--gap_opening_cost -200 --gap_cost -50 \
--max_shift 1 --shift_cost -150
Using default text output mode, this produces
Input:
seqA GCGGGGGAUAUCCCCAUCG
seqB GGGGAUAUCCCCAUCG
strA ...(((.....))).....
strB .(((.....)))....
SCORE: 6800
A GCGGGGGAUAUCCCC-AUCG
B G---GGGAUAUCCCC-AUCG
A ss ...-(((.....))).....
B ss .---(((.....)))-....
A shifts ...<...........>....
B shifts ....................
Structures will be predicted (using the Vienna RNA package) if they are not explicitly given, e.g.
bialign.py UGUAAACAUCCUCGACUGGAAGCUGUGAAGCCACAAAUGGGCUUUCAGUCGGAUGUUUGCA UGUAAACAUCCUACACUCAGCUGUCAUACAUGCGUUGGCUGGGAUGUGGAUGUUUACG
Note that this fails, if the Vienna RNA package with Python binding is not available.
Bi-Alignments of proteins with affine gap cost
bialign.py RAKLPLKEKKLTATANYHPGIRYIMTGYSAKYIYSSTYARFR KAKLPLKEKKLTRTANYHPGIRYIMTGYSAKRIYSSTYAYFR \
--strA "CHHHHHHHHHHHHHCCCCTCEEEEEEECCTCEEEEEEEECCC" --strB "HHHHHHHHHHHHCCCCCCTCEEEEEEECCCCCEEEEEEEECC" \
--type Protein --shift_cost -150 --structure_weight 800 --simmatrix BLOSUM62 --gap_opening_cost -150 \
--gap_cost -50 --max_shift 1 --outmode sorted
Due to the requested output mode sorted
, this produces text output with BLAST-like
annotation by the respective consensus sequence and structure of
the sequence and structure alignment component.
Input:
seqA RAKLPLKEKKLTATANYHPGIRYIMTGYSAKYIYSSTYARFR
seqB KAKLPLKEKKLTRTANYHPGIRYIMTGYSAKRIYSSTYAYFR
strA CHHHHHHHHHHHHHCCCCTCEEEEEEECCTCEEEEEEEECCC
strB HHHHHHHHHHHHCCCCCCTCEEEEEEECCCCCEEEEEEEECC
SCORE: 48500
A ss -CHHHHHHHHHHHHHCCCCTCEEEEEEECCTCEEEEEEEEC-CC
A -RAKLPLKEKKLTATANYHPGIRYIMTGYSAKYIYSSTYAR-FR
consensus -.AKLPLKEKKLT.TANYHPGIRYIMTGYSAK.IYSSTYA.-FR
B -KAKLPLKEKKLTRTANYHPGIRYIMTGYSAKRIYSSTYAY-FR
B ss -HHHHHHHHHHHHCCCCCCTCEEEEEEECCCCCEEEEEEEE-CC
consensus ss -.HHHHHHHHHHH..CCCCTCEEEEEEECC.C.EEEEEEE.-CC
A RAKLPLKEKKLTA-TANYHPGIRYIMTGYSAK-YIYSSTYARFR
A ss CHHHHHHHHHHHH-HCCCCTCEEEEEEECCTC-EEEEEEEECCC
consensus ss .HHHHHHHHHHHH..CCCCTCEEEEEEECC.C.EEEEEEEE.CC
B ss -HHHHHHHHHHHHCCCCCCTCEEEEEEECCCCCEEEEEEEE-CC
B -KAKLPLKEKKLTRTANYHPGIRYIMTGYSAKRIYSSTYAY-FR
consensus .........K....TANYHPGIRYIMTGYSAK....S.....FR
A shifts >............<..................<........>..
B shifts ............................................
Input can also be read from files as written by the secondary structure prediction web server CFSSP (Kumar et al, 2013; http://www.biogem.org/tool/chou-fasman).
bialign.py --filein Examples/DNAPolymerase1_Escherichia.cfssp Examples/DNAPolymerase1_Xanthomonas.cfssp \
--type Protein --shift_cost -150 --structure_weight 800 --simmatrix BLOSUM62 --gap_opening_cost -150 \
--gap_cost -50 --max_shift 1
Python interface
The following code generates a bi-alignment of two toy proteins and shows the resulting alignment in a graphical representation.
import bialignment
import bialignment as ba
import timeit
args = {'type': 'Protein',
'gap_cost': -50,
'gap_opening_cost': -150,
'shift_cost': -150,
'structure_weight': 800,
'max_shift': 1,
'simmatrix': 'BLOSUM62'
}
args['nameA'] = 'A'
args['nameB'] = 'B'
strA = "CHHHHHHHHHHHHHCCCCTCEEEEEEECCTCEEEEEEEECCC"
seqA = "RAKLPLKEKKLTATANYHPGIRYIMTGYSAKYIYSSTYARFR"
seqB = "KAKLPLKEKKLTRTANYHPGIRYIMTGYSAKRIYSSTYAYFR"
strB = "HHHHHHHHHHHHCCCCCCTCEEEEEEECCCCCEEEEEEEECC"
bialigner = ba.BiAligner(seqA, seqB, strA, strB,
**args)
score = bialigner.optimize()
print('SCORE',score)
print()
for line in bialigner.decode_trace():
print(line)
ba.plot_alignment(bialigner.decode_trace_full(),
width = 80,
show_position_numbers=False,
name_offset=3,
#outname = "example.svg" #optionally write plot to file
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file bialign-0.3.tar.gz
.
File metadata
- Download URL: bialign-0.3.tar.gz
- Upload date:
- Size: 202.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/0.0.0 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 75b377e666ec850b63762dbbc54724b20d3df99b5d778f6753e24cb5e74eb6d3 |
|
MD5 | f6164df5e710d04d8e0636ab6ae6d919 |
|
BLAKE2b-256 | f35357423dc819cde253f77f96d10cf571c5b5867731d7cb1c7d9328a022ee48 |