Skip to main content

Python implementation for Gblocks

Project description

PyGblocks

PyPI - Version PyPI - Python Version GitHub - Tests

Pure Python implementation for the algorithm behind Gblocks.

Differences

  • Can define which characters are considered as gaps
  • Any character that is not defined as a gap is considered for conservation
  • Can define how many gaps are allowed per position (number or percentage)
  • Positions that only contain gaps are not pre-emptively removed
  • Conservation threshold may be set below 50%
  • No support for similarity matrices

Installation

PyGblocks is available on PyPI. You can install it through pip:

pip install itaxotools-pygblocks

Usage

First create a mask from your sequences, then apply that mask to each sequence:

from itaxotools.pygblocks import compute_mask, trim_sequences

sequences = [
    "--TTTNNTACTTTTTTT-ATT",
    "--TTTNTTACGTTTTTG-ATT",
    "--TTTNTTAGTTTTTTC-ATT",
]

mask = compute_mask(sequences)
trimmed_sequences = trim_sequences(sequences, mask)

You may customize the trimming parameters and enable logging when creating the mask:

from itaxotools.pygblocks import Options, compute_mask

options = Options(
    IS=2,    # Minimum Number Of Sequences For A Conserved Position
    FS=3,    # Minimum Number Of Sequences For A Flank Position
    CP=2,    # Maximum Number Of Contiguous Nonconserved Positions
    BL1=3,   # Minimum Length Of A Block, 1st iteration
    BL2=3,   # Minimum Length Of A Block, 2nd iteration
    GT=2,    # Maximum Number of Allowed Gaps For Any Position
    GC="-",  # Definition of Gap Characters
)

mask = compute_mask(sequences, options, log=True)

You may optionally set IS, FS and GT as a percentage of the number of sequences by setting IS, FS and GT values to zero, then modifying the percentage defaults if desired:

options = Options(
    IS=0, FS=0, CP=2, BL1=3, BL2=3, GT=0, GC="-",

    IS_percent = 0.50
    FS_percent = 0.85
    GT_percent = 0.00
)

mask = compute_mask(sequences, options, log=True)

You may find the above examples, plus some examples on how to use PyGblocks with BioPython alignments, in the scripts folder.

Citations

Castresana J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol. 2000 Apr;17(4):540-52. doi: 10.1093/oxfordjournals.molbev.a026334. PMID: 10742046.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

itaxotools-pygblocks-0.1.0.tar.gz (50.9 kB view details)

Uploaded Source

Built Distribution

itaxotools_pygblocks-0.1.0-py3-none-any.whl (17.6 kB view details)

Uploaded Python 3

File details

Details for the file itaxotools-pygblocks-0.1.0.tar.gz.

File metadata

  • Download URL: itaxotools-pygblocks-0.1.0.tar.gz
  • Upload date:
  • Size: 50.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for itaxotools-pygblocks-0.1.0.tar.gz
Algorithm Hash digest
SHA256 955cfb69b701956eefbcf6ef5d80ce4a13d0975a741df7e338c4fe15336aa564
MD5 a49fc65a05fcb5588a270f7990df5ff4
BLAKE2b-256 d627694444173987374427cc571d10cf308e8e4907119a0275b00a77dad4c775

See more details on using hashes here.

File details

Details for the file itaxotools_pygblocks-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for itaxotools_pygblocks-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 faccf8321cecced13cf56f6c5d4bfdfb3c45d893c9adf98379f8b6ff2911340a
MD5 7ff58f9a0129e975ee856de01ef99ed6
BLAKE2b-256 cc11b0accfeb0c97914fcaf1b4e18dad8000c56c6c1b7a9176a2511bc777b341

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page