Tool to merge broken gene due to assembly error based on the alignment
Project description
Broken2merge
Description
The broken2merge
software is designed to concatenate the genes that belong to the same species but appear to be broken due to assembly issues. It provides a solution for merging fragmented gene sequences into a single, complete sequence.
Features
- Gene concatenation:
broken2merge
identify and merge fragmented gene sequences into a single, continuous sequence. - Assembly error detection:
broken2merge
includes error detection mechanisms to identify and handle assembly errors, ensuring accurate gene concatenation.
Installation
To install broken2merge
, follow these steps:
Install with pipy:
pip install broken2merge
Install in a conda/mamba env:
mamba create -n broken2merge python=3.12 tqdm biopython numpy
pip install broken2merge
Usage
To use broken2merge
:
usage: broken2merge [-h] [-v] [-V] -i FASTA_FILE [-o OUTPUT] [-s SEPARATOR]
Takes a alignement and find genes that seems broken and merge them together
options:
-h, --help show this help message and exit
General input dataset options:
-v, --verbose Show verbose output. (For debugging purposes)
-V, --version Show the version number and exit.
-i FASTA_FILE, --input FASTA_FILE
Path to an input fasta file (Required)
-o OUTPUT, --output OUTPUT
Path of the output folder (Default: merge_broken_res)
-s SEPARATOR, --separator SEPARATOR
Separator to use to split the gene name (Default: ';')
--force_merge Force the merge of the genes even if they might be paralogs, will give a unaligned file as main input
Example
Here's an example of how to use broken2merge
to merge gene sequences:
broken2merge -i ftsK.aln.fas -o test -s ';'
Here for an input file named ftsK.aln.fas
and output folder named test
and the separator is ;
in the gene name (species_name;gene_name
).
Explaination of the concatenation
Case that will be handled:
- Two genes that do not overlap in the alignment
speciesA_geneA ------------------atgattgaactcgccc
speciesA_geneA atgattgaactcgccc------------------
it will become
speciesA_merge atgattgaactcgcc--catgattgaactcgccc
- Two genes that overlap and perfectly in the alignment in only one part
speciesA_geneA ----------------actcgcccatgattgaactcgccc
||||||||
speciesA_geneB atgattgaactcgcccactcgccc----------------
will become
speciesA_merge atgattgaactcgcccactcgcccatgattgaactcgccc
- Two genes with a overlap not at the extremity of the gene
speciesA_geneA ---------------------actcgcccatgattgaactcgccc
||||||||
speciesA_geneB atgattgaactcgccc-----actcgccc----------------
will become
speciesA_merge atgattgaactcgccc-----actcgcccatgattgaactcgccc
- Two genes wit an overlap that is not 100% perfect
speciesA_geneA ---------------------actcgcccatgattgaactcgccc
|| ||| |
speciesA_geneB atgattgaactcgccc-----acccgcgc---------------
The sequences will be discarded from the alignement.
If --force_merge
is used, it the sequences will be merged with the sequence one after the other in order from the alignement.
The output fasta file will unaligned.
it will become
speciesA_merge atgattgaactcgcccacccgcgcactcgcccatgattgaactcgccc
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file broken2merge-0.3.0.tar.gz
.
File metadata
- Download URL: broken2merge-0.3.0.tar.gz
- Upload date:
- Size: 8.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.12.7 Linux/6.5.0-1025-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f88d9d5707687e3c7ac36921c14045ec392dfd8f318e3a2f11aef82f1e58baf6 |
|
MD5 | 46b0c68ab90aba90d992b957c88fa4cb |
|
BLAKE2b-256 | 4dc76b34953dcba8f6e6cbe5c080ffe7feaca0270e5bfc8fa0735e273afdc98d |
File details
Details for the file broken2merge-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: broken2merge-0.3.0-py3-none-any.whl
- Upload date:
- Size: 9.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.12.7 Linux/6.5.0-1025-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac3754a6959f0bd8fd5db92516b0f259e37adea567704d29300af308af9e2882 |
|
MD5 | 9ef7beb04132c2b77f22b2f545781a5b |
|
BLAKE2b-256 | 687e56be800d602437a8a203957e6819d4afc050214465d0ebf0334f56e267a4 |