Skip to main content

Aligning a large number of protein sequences to a genome (use genblasta and genewise)

Project description

HugeP2G

An integration tool for aligning large numbers of protein sequences to the genome (use genBlastA and GeneWise2)

GeneWise2 gives very fine amino acid to genome alignment results through dynamic programming algorithms, but it is very inefficient. genBlastA allows you to align the amino acid sequences to the approximate region of the genome first, and then go through GeneWise2 to give a precise alignment. HugeP2G provides an automated process for this.

Installation

You need genBlastA and GeneWise2 to run HugeP2G, in which genBlastA is included in the HugeP2G package, and GeneWise2 can be downloaded from here.

Install HugeP2G from PyPI:

pip install HugeP2G

Usage

usage: HugeP2G [-h] [-s SKIP_RANGE_FILE] [-d WORK_DIR] [-t NUM_THREADS] [-c GENE_COVERAGE] [-n GENBLASTA_HIT_NUM] [-sc SKIP_COVERAGE] [-split SEQ_NUM_IN_SUBDIR] [-r] query_protein_table target_genome_fasta

Aligning a large number of protein sequences to a genome (use genblasta and genewise)

positional arguments:
  query_protein_table   Path of query genome table in tsv format, must have column "sp_id" and "pt_file", "sp_id" is the species id, "pt_file" is the path of protein fasta file
  target_genome_fasta   Path of target genome fasta file

optional arguments:
  -h, --help            show this help message and exit
  -s SKIP_RANGE_FILE, --skip_range_file SKIP_RANGE_FILE
                        Path of skip_range_file, tsv file, should have column name "chr", "start", and "end", the range of the genome that need to be skipped
  -d WORK_DIR, --work_dir WORK_DIR
                        Path of work dir (default as ./hugep2g_out)
  -t NUM_THREADS, --num_threads NUM_THREADS
                        threads number (default as 56)
  -c GENE_COVERAGE, --gene_coverage GENE_COVERAGE
                        gene coverage (default as 0.2)
  -n GENBLASTA_HIT_NUM, --genblasta_hit_num GENBLASTA_HIT_NUM
                        genblasta hit num (default as 50)
  -sc SKIP_COVERAGE, --skip_coverage SKIP_COVERAGE
                        annotated_coverage (default as 0.8)
  -split SEQ_NUM_IN_SUBDIR, --seq_num_in_subdir SEQ_NUM_IN_SUBDIR
                        split big fasta file to run (default as 1000)
  -r, --force_redo      force redo all job

Example

HugeP2G -t 80 query.tsv genome.fasta

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

HugeP2G-1.0.0.tar.gz (22.0 kB view details)

Uploaded Source

Built Distribution

HugeP2G-1.0.0-py3-none-any.whl (23.9 kB view details)

Uploaded Python 3

File details

Details for the file HugeP2G-1.0.0.tar.gz.

File metadata

  • Download URL: HugeP2G-1.0.0.tar.gz
  • Upload date:
  • Size: 22.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for HugeP2G-1.0.0.tar.gz
Algorithm Hash digest
SHA256 8f3a2483ea0d2538491a8b1503d8e2c15c612c5f21a75d2e4a76fea7c3f95dfa
MD5 57c3954f28f76ebe6e7b047403487724
BLAKE2b-256 8ee83cd8dc70ba474356e225959d7d602e575bec0ec7e5c84bebe9334b78181d

See more details on using hashes here.

File details

Details for the file HugeP2G-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: HugeP2G-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 23.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for HugeP2G-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 edf37dedbfe8b26f2b42ce29ea7a3123a8f4e346c42bbeef22c64142d7611c9c
MD5 3a40692ad53ca527de9112ba662fbd29
BLAKE2b-256 b183a14c2c02414dd13975f6ed0914cef0da0a0686a80c345eede905907cc9dd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page