Skip to main content

Tool to merge broken gene due to assembly error based on the alignment

Project description

Broken2merge

Description

The broken2merge software is designed to concatenate the genes that belong to the same species but appear to be broken due to assembly issues. It provides a solution for merging fragmented gene sequences into a single, complete sequence.

Features

  • Gene concatenation: broken2merge identify and merge fragmented gene sequences into a single, continuous sequence.
  • Assembly error detection: broken2merge includes error detection mechanisms to identify and handle assembly errors, ensuring accurate gene concatenation.

Installation

To install broken2merge, follow these steps:

Install with pipy:

pip install broken2merge

Install in a conda/mamba env:

mamba create -n broken2merge python=3.12 tqdm biopython numpy
pip install broken2merge

Usage

To use broken2merge:

usage: broken2merge [-h] [-v] [-V] -i FASTA_FILE [-o OUTPUT] [-s SEPARATOR]

Takes a alignement and find genes that seems broken and merge them together

options:
  -h, --help            show this help message and exit

General input dataset options:
  -v, --verbose         Show verbose output. (For debugging purposes)
  -V, --version         Show the version number and exit.
  -i FASTA_FILE, --input FASTA_FILE
                        Path to an input fasta file (Required)
  -o OUTPUT, --output OUTPUT
                        Path of the output folder (Default: merge_broken_res)
  -s SEPARATOR, --separator SEPARATOR
                        Separator to use to split the gene name (Default: ';')
  --force_merge         Force the merge of the genes even if they might be paralogs, will give a unaligned file as main input

Example

Here's an example of how to use broken2merge to merge gene sequences:

broken2merge -i ftsK.aln.fas -o test -s ';'

Here for an input file named ftsK.aln.fas and output folder named test and the separator is ; in the gene name (species_name;gene_name).

Explaination of the concatenation

Case that will be handled:

  1. Two genes that do not overlap in the alignment
speciesA_geneA  ------------------atgattgaactcgccc                
speciesA_geneA  atgattgaactcgccc------------------

it will become

speciesA_merge atgattgaactcgcc--catgattgaactcgccc
  1. Two genes that overlap and perfectly in the alignment in only one part
speciesA_geneA  ----------------actcgcccatgattgaactcgccc
                                ||||||||
speciesA_geneB  atgattgaactcgcccactcgccc----------------

will become

speciesA_merge atgattgaactcgcccactcgcccatgattgaactcgccc
  1. Two genes with a overlap not at the extremity of the gene
speciesA_geneA  ---------------------actcgcccatgattgaactcgccc
                                     ||||||||
speciesA_geneB  atgattgaactcgccc-----actcgccc----------------

will become

speciesA_merge atgattgaactcgccc-----actcgcccatgattgaactcgccc
  1. Two genes wit an overlap that is not 100% perfect
speciesA_geneA  ---------------------actcgcccatgattgaactcgccc
                                     || ||| | 
speciesA_geneB  atgattgaactcgccc-----acccgcgc---------------

The sequences will be discarded from the alignement.

If --force_merge is used, it the sequences will be merged with the sequence one after the other in order from the alignement. The output fasta file will unaligned.

it will become

speciesA_merge atgattgaactcgcccacccgcgcactcgcccatgattgaactcgccc

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

broken2merge-0.2.0.tar.gz (8.3 kB view details)

Uploaded Source

Built Distribution

broken2merge-0.2.0-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file broken2merge-0.2.0.tar.gz.

File metadata

  • Download URL: broken2merge-0.2.0.tar.gz
  • Upload date:
  • Size: 8.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.12.7 Linux/6.5.0-1025-azure

File hashes

Hashes for broken2merge-0.2.0.tar.gz
Algorithm Hash digest
SHA256 417cf16214daee4b1d5868833e4500c350c4d276dea77bf5511bf5a592331005
MD5 a7506fcf6bf85e16643b457f1182a7dc
BLAKE2b-256 ff83e37cbc26fe4d8d8149c9dc1e29b0e05e9a4aad1c60f860414c1fb131e92f

See more details on using hashes here.

File details

Details for the file broken2merge-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: broken2merge-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.12.7 Linux/6.5.0-1025-azure

File hashes

Hashes for broken2merge-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9cf3e20ce5e1eeb60013f95a1b02bfae4630fed8549b6b6a0a7e6a82293c7ee5
MD5 a20fab45d7c8c1b094ae1ef99fb262b8
BLAKE2b-256 5a5a9ee430e3569af188d0a5ab6c9778327e518beefe5628bfdfc43b842d0880

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page