Skip to main content

Aligning a large number of protein sequences to a genome (use genblasta and genewise)

Project description

HugeP2G

An integration tool for aligning large numbers of protein sequences to the genome (use genBlastA and GeneWise2)

GeneWise2 gives very fine amino acid to genome alignment results through dynamic programming algorithms, but it is very inefficient. genBlastA allows you to align the amino acid sequences to the approximate region of the genome first, and then go through GeneWise2 to give a precise alignment. HugeP2G provides an automated process for this.

Installation

You need genBlastA and GeneWise2 to run HugeP2G, in which genBlastA is included in the HugeP2G package, and GeneWise2 can be downloaded from here.

Install HugeP2G from PyPI:

pip install HugeP2G

Usage

usage: HugeP2G [-h] [-s SKIP_RANGE_FILE] [-d WORK_DIR] [-t NUM_THREADS] [-c GENE_COVERAGE] [-n GENBLASTA_HIT_NUM] [-sc SKIP_COVERAGE] [-split SEQ_NUM_IN_SUBDIR] [-r] query_protein_table target_genome_fasta

Aligning a large number of protein sequences to a genome (use genblasta and genewise)

positional arguments:
  query_protein_table   Path of query genome table in tsv format, must have column "sp_id" and "pt_file", "sp_id" is the species id, "pt_file" is the path of protein fasta file
  target_genome_fasta   Path of target genome fasta file

optional arguments:
  -h, --help            show this help message and exit
  -s SKIP_RANGE_FILE, --skip_range_file SKIP_RANGE_FILE
                        Path of skip_range_file, tsv file, should have column name "chr", "start", and "end", the range of the genome that need to be skipped
  -d WORK_DIR, --work_dir WORK_DIR
                        Path of work dir (default as ./hugep2g_out)
  -t NUM_THREADS, --num_threads NUM_THREADS
                        threads number (default as 56)
  -c GENE_COVERAGE, --gene_coverage GENE_COVERAGE
                        gene coverage (default as 0.2)
  -n GENBLASTA_HIT_NUM, --genblasta_hit_num GENBLASTA_HIT_NUM
                        genblasta hit num (default as 50)
  -sc SKIP_COVERAGE, --skip_coverage SKIP_COVERAGE
                        annotated_coverage (default as 0.8)
  -split SEQ_NUM_IN_SUBDIR, --seq_num_in_subdir SEQ_NUM_IN_SUBDIR
                        split big fasta file to run (default as 1000)
  -r, --force_redo      force redo all job

Example

HugeP2G -t 80 query.tsv genome.fasta

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hugep2g-1.0.2.tar.gz (8.2 MB view details)

Uploaded Source

Built Distribution

HugeP2G-1.0.2-py3-none-any.whl (8.3 MB view details)

Uploaded Python 3

File details

Details for the file hugep2g-1.0.2.tar.gz.

File metadata

  • Download URL: hugep2g-1.0.2.tar.gz
  • Upload date:
  • Size: 8.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.7

File hashes

Hashes for hugep2g-1.0.2.tar.gz
Algorithm Hash digest
SHA256 065b7e8c8d60323ed4f7b9809dcad7e66991c523499ac728942915cfed98231e
MD5 6b2295eadc745d70fbc16dc1ef577925
BLAKE2b-256 2651e9db061229dd4bb71910ebe525c36a7f168f2624d6e7893034da1877924e

See more details on using hashes here.

File details

Details for the file HugeP2G-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: HugeP2G-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 8.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.7

File hashes

Hashes for HugeP2G-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 79d3c3f7e272cbb0275e48fc4f76af39dc7f318f142eb5a44e1899aa5ac6edbc
MD5 f067d27edbf0b9dd46c1cca3f550c4d1
BLAKE2b-256 15983a7b201f13b431a09b91921c1e3e37980db7d6efa0d163445be172927f8b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page