Skip to main content

Tool to merge broken gene due to assembly error based on the alignment

Project description

Broken2merge

Description

The broken2merge software is designed to concatenate the genes that belong to the same species but appear to be broken due to assembly issues. It provides a solution for merging fragmented gene sequences into a single, complete sequence.

Features

  • Gene concatenation: broken2merge identify and merge fragmented gene sequences into a single, continuous sequence.
  • Assembly error detection: broken2merge includes error detection mechanisms to identify and handle assembly errors, ensuring accurate gene concatenation.

Installation

To install broken2merge, follow these steps:

Install with pipy:

pip install broken2merge

Install in a conda/mamba env:

mamba create -n broken2merge python=3.12 tqdm biopython numpy
pip install broken2merge

Usage

To use broken2merge:

usage: broken2merge [-h] [-v] [-V] -i FASTA_FILE [-o OUTPUT] [-s SEPARATOR]

Takes a alignement and find genes that seems broken and merge them together

options:
  -h, --help            show this help message and exit

General input dataset options:
  -v, --verbose         Show verbose output. (For debugging purposes)
  -V, --version         Show the version number and exit.
  -i FASTA_FILE, --input FASTA_FILE
                        Path to an input fasta file (Required)
  -o OUTPUT, --output OUTPUT
                        Path of the output folder (Default: merge_broken_res)
  -s SEPARATOR, --separator SEPARATOR
                        Separator to use to split the gene name (Default: ';')
  --force_merge         Force the merge of the genes even if they might be paralogs, will give a unaligned file as main input

Example

Here's an example of how to use broken2merge to merge gene sequences:

broken2merge -i ftsK.aln.fas -o test -s ';'

Here for an input file named ftsK.aln.fas and output folder named test and the separator is ; in the gene name (species_name;gene_name).

Explaination of the concatenation

Case that will be handled:

  1. Two genes that do not overlap in the alignment
speciesA_geneA  ------------------atgattgaactcgccc                
speciesA_geneA  atgattgaactcgccc------------------

it will become

speciesA_merge atgattgaactcgcc--catgattgaactcgccc
  1. Two genes that overlap and perfectly in the alignment in only one part
speciesA_geneA  ----------------actcgcccatgattgaactcgccc
                                ||||||||
speciesA_geneB  atgattgaactcgcccactcgccc----------------

will become

speciesA_merge atgattgaactcgcccactcgcccatgattgaactcgccc
  1. Two genes with a overlap not at the extremity of the gene
speciesA_geneA  ---------------------actcgcccatgattgaactcgccc
                                     ||||||||
speciesA_geneB  atgattgaactcgccc-----actcgccc----------------

will become

speciesA_merge atgattgaactcgccc-----actcgcccatgattgaactcgccc
  1. Two genes wit an overlap that is not 100% perfect
speciesA_geneA  ---------------------actcgcccatgattgaactcgccc
                                     || ||| | 
speciesA_geneB  atgattgaactcgccc-----acccgcgc---------------

The sequences will be discarded from the alignement.

If --force_merge is used, it the sequences will be merged with the sequence one after the other in order from the alignement. The output fasta file will unaligned.

it will become

speciesA_merge atgattgaactcgcccacccgcgcactcgcccatgattgaactcgccc

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

broken2merge-0.3.0.tar.gz (8.3 kB view details)

Uploaded Source

Built Distribution

broken2merge-0.3.0-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file broken2merge-0.3.0.tar.gz.

File metadata

  • Download URL: broken2merge-0.3.0.tar.gz
  • Upload date:
  • Size: 8.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.12.7 Linux/6.5.0-1025-azure

File hashes

Hashes for broken2merge-0.3.0.tar.gz
Algorithm Hash digest
SHA256 f88d9d5707687e3c7ac36921c14045ec392dfd8f318e3a2f11aef82f1e58baf6
MD5 46b0c68ab90aba90d992b957c88fa4cb
BLAKE2b-256 4dc76b34953dcba8f6e6cbe5c080ffe7feaca0270e5bfc8fa0735e273afdc98d

See more details on using hashes here.

File details

Details for the file broken2merge-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: broken2merge-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.12.7 Linux/6.5.0-1025-azure

File hashes

Hashes for broken2merge-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ac3754a6959f0bd8fd5db92516b0f259e37adea567704d29300af308af9e2882
MD5 9ef7beb04132c2b77f22b2f545781a5b
BLAKE2b-256 687e56be800d602437a8a203957e6819d4afc050214465d0ebf0334f56e267a4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page