Aligning a large number of protein sequences to a genome (use genblasta and genewise)
Project description
HugeP2G
An integration tool for aligning large numbers of protein sequences to the genome (use genBlastA and GeneWise2)
GeneWise2 gives very fine amino acid to genome alignment results through dynamic programming algorithms, but it is very inefficient. genBlastA allows you to align the amino acid sequences to the approximate region of the genome first, and then go through GeneWise2 to give a precise alignment. HugeP2G provides an automated process for this.
Installation
You need genBlastA and GeneWise2 to run HugeP2G, in which genBlastA is included in the HugeP2G package, and GeneWise2 can be downloaded from here.
Install HugeP2G from PyPI:
pip install HugeP2G
Usage
usage: HugeP2G [-h] [-s SKIP_RANGE_FILE] [-d WORK_DIR] [-t NUM_THREADS] [-c GENE_COVERAGE] [-n GENBLASTA_HIT_NUM] [-sc SKIP_COVERAGE] [-split SEQ_NUM_IN_SUBDIR] [-r] query_protein_table target_genome_fasta
Aligning a large number of protein sequences to a genome (use genblasta and genewise)
positional arguments:
query_protein_table Path of query genome table in tsv format, must have column "sp_id" and "pt_file", "sp_id" is the species id, "pt_file" is the path of protein fasta file
target_genome_fasta Path of target genome fasta file
optional arguments:
-h, --help show this help message and exit
-s SKIP_RANGE_FILE, --skip_range_file SKIP_RANGE_FILE
Path of skip_range_file, tsv file, should have column name "chr", "start", and "end", the range of the genome that need to be skipped
-d WORK_DIR, --work_dir WORK_DIR
Path of work dir (default as ./hugep2g_out)
-t NUM_THREADS, --num_threads NUM_THREADS
threads number (default as 56)
-c GENE_COVERAGE, --gene_coverage GENE_COVERAGE
gene coverage (default as 0.2)
-n GENBLASTA_HIT_NUM, --genblasta_hit_num GENBLASTA_HIT_NUM
genblasta hit num (default as 50)
-sc SKIP_COVERAGE, --skip_coverage SKIP_COVERAGE
annotated_coverage (default as 0.8)
-split SEQ_NUM_IN_SUBDIR, --seq_num_in_subdir SEQ_NUM_IN_SUBDIR
split big fasta file to run (default as 1000)
-r, --force_redo force redo all job
Example
HugeP2G -t 80 query.tsv genome.fasta
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hugep2g-1.0.4.tar.gz
.
File metadata
- Download URL: hugep2g-1.0.4.tar.gz
- Upload date:
- Size: 8.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8aa7bde4902acf0f54493b14c4004535d2bc6c1fc75030a9e4e578b0777d52f4 |
|
MD5 | 348ac03cde43db7420a567bc2d77e7e4 |
|
BLAKE2b-256 | cb07f406c4f0452c2e1544e999b4c6e81ce9fc867547d04c375bcc67a2aaf90f |
File details
Details for the file HugeP2G-1.0.4-py3-none-any.whl
.
File metadata
- Download URL: HugeP2G-1.0.4-py3-none-any.whl
- Upload date:
- Size: 8.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c86dbcbf47e0bb7b4896d8fd77751f408f1910b6b115a1fe4a37379d0bb8b404 |
|
MD5 | 4f72d83f09b5c65f9bbeddc2423f03a5 |
|
BLAKE2b-256 | 796f72db5b04751a5c80c5e53b5a5c8c67feeed21506be273cf4f60b1ef0ef9c |