Skip to main content

viralVerify rewrite/refactor for PyPI packaging and distribution

Project description

viral_verify

https://img.shields.io/pypi/v/viral_verify.svg https://img.shields.io/travis/peterk87/viral_verify.svg Documentation Status

viralVerify rewrite/refactor for PyPI packaging and distribution, maintainability and clarity.

NOTE: BLAST+ search option has been removed. Results output table will be different than the original viralVerify. Naive Bayes classifier training script has not been ported yet.

Features

  • Gene prediction with Prodigal in metagenomic mode

  • HMMer3 hmmsearch for protein domains in predicted genes

  • Naive Bayes classification of contigs as viral/not viral based on HMMer3 results

  • Output of detailed contig classification results table in CSV format

  • Output of contigs based on classification into separate FASTA files

Requirements

An HMMer3 HMM database is required. For example, the latest version of Pfam-A HMM:

NOTE: Please extract any compressed HMM DB ($ gunzip Pfam-A.hmm.gz)

Software dependencies:

Python dependencies:

Installation

Conda

It’s recommended that you use Conda to install the required software (Prodigal and HMMer3) and Python dependencies.

$ conda env create -f environment.yml

Pip

If you have Prodigal and HMMer3 installed in your $PATH, and Python 3.6 or greater, you can use pip to install viral_verify:

$ pip install viral_verify

Usage

$ viral_verify --help
Usage: viral_verify [OPTIONS]

  HMM and Naive Bayes classification of contig sequences as either viral,
  plasmid or chromosomal.

  Requires Prodigal for gene prediction and hmmsearch from HMMer3 for
  searching for Pfam HMM profiles.

Options:
  -i, --input-fasta PATH          Input fasta file  [required]
  -o, --outdir PATH               Output directory  [required]
  -H, --hmm-db PATH               Path to Pfam-A HMM database  [required]
  -t, --threads INTEGER           Number of threads (default=16)
  -p, --output-plasmids-separately
                                  Output predicted plasmids separately?
  --prefix TEXT                   Output file prefix (default: None)
  --uncertainty-threshold FLOAT   Uncertainty threshold (Natural log
                                  probability) (default=3.0)

  --naive-bayes-classifier-table PATH
                                  Table of protein domain frequencies to use
                                  for Naive Bayes classification (default="/ho
                                  me/pkruczkiewicz/repos/viral_verify/viral_ve
                                  rify/data/classifier_table.txt")

  -v, --verbose                   Logging verbosity
  --version                       Show the version and exit.
  --help                          Show this message and exit.

Credits

The original source code, design and conception can be found at viralVerify. This is merely a rewrite for easier packaging via PyPI, adding some CI with Travis-CI and organizing the code for maintainability and clarity.

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

History

0.1.1 (2020-06-04)

  • Fix PyPI release (include classifier_table.txt in package)

0.1.0 (2020-06-03)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

viral_verify-0.1.1.tar.gz (173.8 kB view hashes)

Uploaded Source

Built Distribution

viral_verify-0.1.1-py2.py3-none-any.whl (172.9 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page