Skip to main content

Aligning a large number of protein sequences to a genome (use genblasta and genewise)

Project description

HugeP2G

An integration tool for aligning large numbers of protein sequences to the genome (use genBlastA and GeneWise2)

GeneWise2 gives very fine amino acid to genome alignment results through dynamic programming algorithms, but it is very inefficient. genBlastA allows you to align the amino acid sequences to the approximate region of the genome first, and then go through GeneWise2 to give a precise alignment. HugeP2G provides an automated process for this.

Installation

You need genBlastA and GeneWise2 to run HugeP2G, in which genBlastA is included in the HugeP2G package, and GeneWise2 can be downloaded from here.

Install HugeP2G from PyPI:

pip install HugeP2G

Usage

usage: HugeP2G [-h] [-s SKIP_RANGE_FILE] [-d WORK_DIR] [-t NUM_THREADS] [-c GENE_COVERAGE] [-n GENBLASTA_HIT_NUM] [-sc SKIP_COVERAGE] [-split SEQ_NUM_IN_SUBDIR] [-r] query_protein_table target_genome_fasta

Aligning a large number of protein sequences to a genome (use genblasta and genewise)

positional arguments:
  query_protein_table   Path of query genome table in tsv format, must have column "sp_id" and "pt_file", "sp_id" is the species id, "pt_file" is the path of protein fasta file
  target_genome_fasta   Path of target genome fasta file

optional arguments:
  -h, --help            show this help message and exit
  -s SKIP_RANGE_FILE, --skip_range_file SKIP_RANGE_FILE
                        Path of skip_range_file, tsv file, should have column name "chr", "start", and "end", the range of the genome that need to be skipped
  -d WORK_DIR, --work_dir WORK_DIR
                        Path of work dir (default as ./hugep2g_out)
  -t NUM_THREADS, --num_threads NUM_THREADS
                        threads number (default as 56)
  -c GENE_COVERAGE, --gene_coverage GENE_COVERAGE
                        gene coverage (default as 0.2)
  -n GENBLASTA_HIT_NUM, --genblasta_hit_num GENBLASTA_HIT_NUM
                        genblasta hit num (default as 50)
  -sc SKIP_COVERAGE, --skip_coverage SKIP_COVERAGE
                        annotated_coverage (default as 0.8)
  -split SEQ_NUM_IN_SUBDIR, --seq_num_in_subdir SEQ_NUM_IN_SUBDIR
                        split big fasta file to run (default as 1000)
  -r, --force_redo      force redo all job

Example

HugeP2G -t 80 query.tsv genome.fasta

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hugep2g-1.0.4.tar.gz (8.2 MB view details)

Uploaded Source

Built Distribution

HugeP2G-1.0.4-py3-none-any.whl (8.3 MB view details)

Uploaded Python 3

File details

Details for the file hugep2g-1.0.4.tar.gz.

File metadata

  • Download URL: hugep2g-1.0.4.tar.gz
  • Upload date:
  • Size: 8.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.7

File hashes

Hashes for hugep2g-1.0.4.tar.gz
Algorithm Hash digest
SHA256 8aa7bde4902acf0f54493b14c4004535d2bc6c1fc75030a9e4e578b0777d52f4
MD5 348ac03cde43db7420a567bc2d77e7e4
BLAKE2b-256 cb07f406c4f0452c2e1544e999b4c6e81ce9fc867547d04c375bcc67a2aaf90f

See more details on using hashes here.

File details

Details for the file HugeP2G-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: HugeP2G-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 8.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.7

File hashes

Hashes for HugeP2G-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 c86dbcbf47e0bb7b4896d8fd77751f408f1910b6b115a1fe4a37379d0bb8b404
MD5 4f72d83f09b5c65f9bbeddc2423f03a5
BLAKE2b-256 796f72db5b04751a5c80c5e53b5a5c8c67feeed21506be273cf4f60b1ef0ef9c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page