Skip to main content

Aligning a large number of protein sequences to a genome (use genblasta and genewise)

Project description

HugeP2G

An integration tool for aligning large numbers of protein sequences to the genome (use genBlastA and GeneWise2)

GeneWise2 gives very fine amino acid to genome alignment results through dynamic programming algorithms, but it is very inefficient. genBlastA allows you to align the amino acid sequences to the approximate region of the genome first, and then go through GeneWise2 to give a precise alignment. HugeP2G provides an automated process for this.

Installation

You need genBlastA and GeneWise2 to run HugeP2G, in which genBlastA is included in the HugeP2G package, and GeneWise2 can be downloaded from here.

Install HugeP2G from PyPI:

pip install HugeP2G

Usage

usage: HugeP2G [-h] [-s SKIP_RANGE_FILE] [-d WORK_DIR] [-t NUM_THREADS] [-c GENE_COVERAGE] [-n GENBLASTA_HIT_NUM] [-sc SKIP_COVERAGE] [-split SEQ_NUM_IN_SUBDIR] [-r] query_protein_table target_genome_fasta

Aligning a large number of protein sequences to a genome (use genblasta and genewise)

positional arguments:
  query_protein_table   Path of query genome table in tsv format, must have column "sp_id" and "pt_file", "sp_id" is the species id, "pt_file" is the path of protein fasta file
  target_genome_fasta   Path of target genome fasta file

optional arguments:
  -h, --help            show this help message and exit
  -s SKIP_RANGE_FILE, --skip_range_file SKIP_RANGE_FILE
                        Path of skip_range_file, tsv file, should have column name "chr", "start", and "end", the range of the genome that need to be skipped
  -d WORK_DIR, --work_dir WORK_DIR
                        Path of work dir (default as ./hugep2g_out)
  -t NUM_THREADS, --num_threads NUM_THREADS
                        threads number (default as 56)
  -c GENE_COVERAGE, --gene_coverage GENE_COVERAGE
                        gene coverage (default as 0.2)
  -n GENBLASTA_HIT_NUM, --genblasta_hit_num GENBLASTA_HIT_NUM
                        genblasta hit num (default as 50)
  -sc SKIP_COVERAGE, --skip_coverage SKIP_COVERAGE
                        annotated_coverage (default as 0.8)
  -split SEQ_NUM_IN_SUBDIR, --seq_num_in_subdir SEQ_NUM_IN_SUBDIR
                        split big fasta file to run (default as 1000)
  -r, --force_redo      force redo all job

Example

HugeP2G -t 80 query.tsv genome.fasta

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

HugeP2G-1.0.1.tar.gz (8.2 MB view details)

Uploaded Source

Built Distribution

HugeP2G-1.0.1-py3-none-any.whl (8.3 MB view details)

Uploaded Python 3

File details

Details for the file HugeP2G-1.0.1.tar.gz.

File metadata

  • Download URL: HugeP2G-1.0.1.tar.gz
  • Upload date:
  • Size: 8.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for HugeP2G-1.0.1.tar.gz
Algorithm Hash digest
SHA256 b13f6c99235b127eaa22c029d4cabf32d820e2d27b27f62200970c0a8e9df79a
MD5 3c2fb11a649de5a8483ccd0024749c8a
BLAKE2b-256 da78df8ddd615f619e2f77ddba61eb400e2ee1518cd009575440e6b3140d97a0

See more details on using hashes here.

File details

Details for the file HugeP2G-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: HugeP2G-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 8.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for HugeP2G-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d9fab1f7c8f6998af8e60db5ed4b5604e9bb2ca19dbd3f1b20cf3681729a5749
MD5 6478ee069f1651b7468425bd802579b1
BLAKE2b-256 80248a7e721b977cf27455dff94ccf0d62f69e985eff05028155427c53643595

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page