pyabpoa: SIMD-based partial order alignment using adaptive band
Project description
pyabpoa: abPOA Python interface
Introduction
pyabpoa provides an easy-to-use interface to abPOA, it contains all the APIs that can be used to perform MSA for a set of sequences and consensus calling from the final alignment graph.
Installation
Install pyabpoa with pip
pyabpoa can be installed with pip:
pip install pyabpoa
Install pyabpoa from source
Alternatively, you can install pyabpoa from source (cython is required):
git clone https://github.com/yangao07/abPOA.git
cd abPOA
make install_py
Examples
The following code illustrates how to use pyabpoa.
import pyabpoa as pa
a = pa.msa_aligner()
seqs=[
'CCGAAGA',
'CCGAACTCGA',
'CCCGGAAGA',
'CCGAAGA'
]
res=a.msa(seqs, out_cons=True, out_msa=True, out_pog='pog.png', incr_fn='') # perform multiple sequence alignment
# generate a figure of alignment graph to pog.png
for seq in res.cons_seq:
print(seq) # print consensus sequence
res.print_msa() # print row-column multiple sequence alignment in PIR format
You can also try the example script provided in the source folder:
python ./python/example.py
APIs
Class pyabpoa.msa_aligner
pyabpoa.msa_aligner(aln_mode='g', ...)
This constructs a multiple sequence alignment handler of pyabpoa, it accepts the following arguments:
- aln_mode: alignment mode. 'g': global, 'l': local, 'e': extension; default: 'g'
- match: match score; default: 2
- gap_open1: first gap opening penalty; default: 4
- gap_ext1: first gap extension penalty; default: 2
- gap_open2: second gap opening penalty; default: 24
- gap_ext2: second gap extension penalty; default: 1
- extra_b: first adaptive banding paremeter; set as < 0 to disable adaptive banded DP; default: 10
- extra_f: second adaptive banding paremete; the number of extra bases added on both sites of the band is b+f*L, where L is the length of the aligned sequence; default : 0.01
- is_diploid: set as 1 if input is diploid datal default: 0
- min_freq: minimum frequency of each consensus to output for diploid datal default: 0.3
The msa_aligner
handler provides one method which performs multiple sequence alignment and takes four arguments:
pyabpoa.msa_aligner.msa(seqs, out_cons, out_msa, out_pog='', incr_fn='')
- seqs: a list variable containing a set of input sequences; positional
- out_cons: a bool variable to ask pyabpoa to generate consensus sequence; positional
- out_msa: a bool variable to ask pyabpoa to generate RC-MSA; positional
- out_pog: name of a file (
.png
or.pdf
) to store the plot of the final alignment graph; optional, default: '' - incr_fn: name of an existing graph (GFA) or MSA (FASTA) file, incrementally align sequence to this graph/MSA; optional, default: ''
Class pyabpoa.msa_result
pyabpoa.msa_result(seq_n, cons_n, cons_len, ...)
This class describes the information of the generated consensus sequence and the RC-MSA. The returned result of pyabpoa.msa_aligner.msa()
is an object of this class that has the following properties:
- seq_n: number of input aligned sequences
- cons_n: number of generated consensus sequences (generally 1, could be 2 if input is specified as diploid)
- cons_len: an array of consensus sequence length(s)
- cons_seq: an array of consensus sequence(s)
- msa_len: size of each row in the RC-MSA
- msa_seq: an array containing
seq_n
strings that demonstrates the RC-MSA, each consisting of one input sequence and several-
indicating the alignment gaps.
pyabpoa.msa_result()
has a function of print_msa
which prints the RC-MSA to screen.
pyabpoa.msa_result().print_msa()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pyabpoa-1.2.1-py3.8-linux-x86_64.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | c6de2778024f2577ec4ca549f86becdd2732f2fc846b0c0ce4aebb302776a9ac |
|
MD5 | 882ad5ac3cf65f51b21ca95cb3ea7774 |
|
BLAKE2b-256 | 9eaa98b0c7ac7285dd7c1dac289df0c8c13fa89572f03e1224caf60fccbc8704 |