Skip to main content

Aligning a large number of protein sequences to a genome (use genblasta and genewise)

Project description

HugeP2G

An integration tool for aligning large numbers of protein sequences to the genome (use genBlastA and GeneWise2)

GeneWise2 gives very fine amino acid to genome alignment results through dynamic programming algorithms, but it is very inefficient. genBlastA allows you to align the amino acid sequences to the approximate region of the genome first, and then go through GeneWise2 to give a precise alignment. HugeP2G provides an automated process for this.

Installation

You need genBlastA and GeneWise2 to run HugeP2G, in which genBlastA is included in the HugeP2G package, and GeneWise2 can be downloaded from here.

Install HugeP2G from PyPI:

pip install HugeP2G

Usage

usage: HugeP2G [-h] [-s SKIP_RANGE_FILE] [-d WORK_DIR] [-t NUM_THREADS] [-c GENE_COVERAGE] [-n GENBLASTA_HIT_NUM] [-sc SKIP_COVERAGE] [-split SEQ_NUM_IN_SUBDIR] [-r] query_protein_table target_genome_fasta

Aligning a large number of protein sequences to a genome (use genblasta and genewise)

positional arguments:
  query_protein_table   Path of query genome table in tsv format, must have column "sp_id" and "pt_file", "sp_id" is the species id, "pt_file" is the path of protein fasta file
  target_genome_fasta   Path of target genome fasta file

optional arguments:
  -h, --help            show this help message and exit
  -s SKIP_RANGE_FILE, --skip_range_file SKIP_RANGE_FILE
                        Path of skip_range_file, tsv file, should have column name "chr", "start", and "end", the range of the genome that need to be skipped
  -d WORK_DIR, --work_dir WORK_DIR
                        Path of work dir (default as ./hugep2g_out)
  -t NUM_THREADS, --num_threads NUM_THREADS
                        threads number (default as 56)
  -c GENE_COVERAGE, --gene_coverage GENE_COVERAGE
                        gene coverage (default as 0.2)
  -n GENBLASTA_HIT_NUM, --genblasta_hit_num GENBLASTA_HIT_NUM
                        genblasta hit num (default as 50)
  -sc SKIP_COVERAGE, --skip_coverage SKIP_COVERAGE
                        annotated_coverage (default as 0.8)
  -split SEQ_NUM_IN_SUBDIR, --seq_num_in_subdir SEQ_NUM_IN_SUBDIR
                        split big fasta file to run (default as 1000)
  -r, --force_redo      force redo all job

Example

HugeP2G -t 80 query.tsv genome.fasta

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hugep2g-1.0.3.tar.gz (8.2 MB view details)

Uploaded Source

Built Distribution

HugeP2G-1.0.3-py3-none-any.whl (8.3 MB view details)

Uploaded Python 3

File details

Details for the file hugep2g-1.0.3.tar.gz.

File metadata

  • Download URL: hugep2g-1.0.3.tar.gz
  • Upload date:
  • Size: 8.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.7

File hashes

Hashes for hugep2g-1.0.3.tar.gz
Algorithm Hash digest
SHA256 a18d446fabf61aef8b1f8a301ce025ccaacd70472ada4d6b08fc5e44bfde5aa0
MD5 8343356d1e1d0c83f517120d19182efe
BLAKE2b-256 897d76e87dcb4dccc985c7af77fc24903ffa7f3b61de75123818a94723c4ecb0

See more details on using hashes here.

File details

Details for the file HugeP2G-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: HugeP2G-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 8.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.7

File hashes

Hashes for HugeP2G-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 91693540184cd34223ddb8797876cb21b543f2f8a60822f8238fb867e824d909
MD5 87c37f97f41794b1b7a6f0ddc3062865
BLAKE2b-256 4275dc0246c5bb3a5ab824001ef1010cc79ea8fec9ff8e157be36d907fd754cd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page