Python implementation for Gblocks
Project description
PyGblocks
Pure Python implementation for the algorithm behind Gblocks.
Differences
- Can define which characters are considered as gaps
- Any character that is not defined as a gap is considered for conservation
- Can define how many gaps are allowed per position (number or percentage)
- Positions that only contain gaps are not pre-emptively removed
- Conservation threshold may be set below 50%
- No support for similarity matrices
Installation
PyGblocks is available on PyPI. You can install it through pip
:
pip install itaxotools-pygblocks
Usage
First create a mask from your sequences, then apply that mask to each sequence:
from itaxotools.pygblocks import compute_mask, trim_sequences
sequences = [
"--TTTNNTACTTTTTTT-ATT",
"--TTTNTTACGTTTTTG-ATT",
"--TTTNTTAGTTTTTTC-ATT",
]
mask = compute_mask(sequences)
trimmed_sequences = trim_sequences(sequences, mask)
You may customize the trimming parameters and enable logging when creating the mask:
from itaxotools.pygblocks import Options, compute_mask
options = Options(
IS=2, # Minimum Number Of Sequences For A Conserved Position
FS=3, # Minimum Number Of Sequences For A Flank Position
CP=2, # Maximum Number Of Contiguous Nonconserved Positions
BL1=3, # Minimum Length Of A Block, 1st iteration
BL2=3, # Minimum Length Of A Block, 2nd iteration
GT=2, # Maximum Number of Allowed Gaps For Any Position
GC="-", # Definition of Gap Characters
)
mask = compute_mask(sequences, options, log=True)
You may optionally set IS, FS and GT as a percentage of the number of sequences by setting IS, FS and GT values to zero, then modifying the percentage defaults if desired:
options = Options(
IS=0, FS=0, CP=2, BL1=3, BL2=3, GT=0, GC="-",
IS_percent = 0.50
FS_percent = 0.85
GT_percent = 0.00
)
mask = compute_mask(sequences, options, log=True)
You may find the above examples, plus some examples on how to use PyGblocks with BioPython alignments, in the scripts folder.
Citations
Castresana J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol. 2000 Apr;17(4):540-52. doi: 10.1093/oxfordjournals.molbev.a026334. PMID: 10742046.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file itaxotools-pygblocks-0.1.0.tar.gz
.
File metadata
- Download URL: itaxotools-pygblocks-0.1.0.tar.gz
- Upload date:
- Size: 50.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 955cfb69b701956eefbcf6ef5d80ce4a13d0975a741df7e338c4fe15336aa564 |
|
MD5 | a49fc65a05fcb5588a270f7990df5ff4 |
|
BLAKE2b-256 | d627694444173987374427cc571d10cf308e8e4907119a0275b00a77dad4c775 |
File details
Details for the file itaxotools_pygblocks-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: itaxotools_pygblocks-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | faccf8321cecced13cf56f6c5d4bfdfb3c45d893c9adf98379f8b6ff2911340a |
|
MD5 | 7ff58f9a0129e975ee856de01ef99ed6 |
|
BLAKE2b-256 | cc11b0accfeb0c97914fcaf1b4e18dad8000c56c6c1b7a9176a2511bc777b341 |