pyabpoa: SIMD-based partial order alignment using adaptive band

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

pyabpoa: abPOA Python interface

Introduction

pyabpoa provides an easy-to-use interface to abPOA, it contains all the APIs that can be used to perform MSA for a set of sequences and consensus calling from the final alignment graph.

Installation

Install pyabpoa with pip

pyabpoa can be installed with pip:

pip install pyabpoa

Install pyabpoa from source

Alternatively, you can install pyabpoa from source (cython is required):

git clone --recursive https://github.com/yangao07/abPOA.git
cd abPOA
make install_py

Examples

The following code illustrates how to use pyabpoa.

import pyabpoa as pa
a = pa.msa_aligner()
seqs=[
'CCGAAGA',
'CCGAACTCGA',
'CCCGGAAGA',
'CCGAAGA'
]
res=a.msa(seqs, out_cons=True, out_msa=True) # perform multiple sequence alignment 

for seq in res.cons_seq:
    print(seq)  # print consensus sequence

res.print_msa() # print row-column multiple sequence alignment in PIR format

You can also try the example script provided in the source folder:

python ./python/example.py

APIs

Class pyabpoa.msa_aligner

pyabpoa.msa_aligner(aln_mode='g', ...)

This constructs a multiple sequence alignment handler of pyabpoa, it accepts the following arguments:

aln_mode: alignment mode. 'g': global, 'l': local, 'e': extension; default: 'g'
is_aa: input is amino acid sequence; default: False
match: match score; default: 2
mismatch: match penaty; default: 4
score_matrix: scoring matrix file, match and mismatch are not used when score_matrix is used; default: ''
gap_open1: first gap opening penalty; default: 4
gap_ext1: first gap extension penalty; default: 2
gap_open2: second gap opening penalty; default: 24
gap_ext2: second gap extension penalty; default: 1
extra_b: first adaptive banding paremeter; set as < 0 to disable adaptive banded DP; default: 10
extra_f: second adaptive banding paremete; the number of extra bases added on both sites of the band is b+f*L, where L is the length of the aligned sequence; default : 0.01

The msa_aligner handler provides one method which performs multiple sequence alignment and takes four arguments:

pyabpoa.msa_aligner.msa(seqs, out_cons, out_msa, out_pog='', incr_fn='')

seqs: a list variable containing a set of input sequences; positional
out_cons: a bool variable to ask pyabpoa to generate consensus sequence; positional
out_msa: a bool variable to ask pyabpoa to generate RC-MSA; positional
max_n_cons: maximum number of consensus sequence to generate; default: 1
min_freq: minimum frequency of each consensus to output (effective when max_n_cons > 1); default: 0.3
out_pog: name of a file (.png or .pdf) to store the plot of the final alignment graph; default: ''
incr_fn: name of an existing graph (GFA) or MSA (FASTA) file, incrementally align sequence to this graph/MSA; default: ''

Class pyabpoa.msa_result

pyabpoa.msa_result(seq_n, cons_n, cons_len, ...)

This class describes the information of the generated consensus sequence and the RC-MSA. The returned result of pyabpoa.msa_aligner.msa() is an object of this class that has the following properties:

n_seq: number of input aligned sequences
n_cons: number of generated consensus sequences (generally 1, could be 2 or more if max_n_cons is set as > 1)
clu_n_seq: an array of sequence cluster size
cons_len: an array of consensus sequence length(s)
cons_seq: an array of consensus sequence(s)
cons_cov: an array of consensus sequence coverage for each base
msa_len: size of each row in the RC-MSA
msa_seq: an array containing n_seq+n_cons strings that demonstrates the RC-MSA, each consisting of one input sequence and several - indicating the alignment gaps.

pyabpoa.msa_result() has a function of print_msa which prints the RC-MSA to screen.

pyabpoa.msa_result().print_msa()

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

1.5.1

Jan 16, 2024

1.5.0

Jan 11, 2024

1.4.3

Dec 18, 2023

1.4.2

Dec 13, 2023

1.4.0

Mar 15, 2022

1.3.0.0

Jan 12, 2022

1.2.5

Jul 21, 2021

1.2.4

Jun 2, 2021

1.2.3

Jun 1, 2021

1.2.1

May 18, 2021

1.2.0

May 15, 2021

1.1.4

May 4, 2021

1.0.6

Dec 28, 2020

1.0.5

Oct 21, 2020

1.0.4

Oct 6, 2020

1.0.3

May 25, 2020

1.0.2

May 10, 2020

1.0.1a0 pre-release

May 2, 2020

1.0.0a1 pre-release

Apr 16, 2020

1.0.0a0 pre-release

Apr 15, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyabpoa-1.5.1.tar.gz (689.4 kB view hashes)

Uploaded Jan 16, 2024 Source

Hashes for pyabpoa-1.5.1.tar.gz

Hashes for pyabpoa-1.5.1.tar.gz
Algorithm	Hash digest
SHA256	`878f981e890a421d92a0d7606705d0ad9813ae6086239460dfe4b0cfc7476174`
MD5	`e5bee6c86fadef6056b98524ef091a4d`
BLAKE2b-256	`afa8b3207f263322610398bdda967a74193886f43c5b352a58c80828aae8a717`