Skip to main content

Pyrodigal cli optimized for metagenomic data

Project description

README

Introduction

This library is a simple wrapper of pyrodigal, which is a cythonized implementation of prodigal that is orders of magnitudes faster.

Pyrodigal is mostly written for single genomes or FASTA files, so this tool was created to batch process metagenomic-scale datasets. Metagenomic data usually consists of large number of genome files for MAGs. Additionally, viral metagenomic datasets tend to store all single-scaffold viruses in a single file, which tends to be much larger than a typical single-genome FASTA file.

This tool uses different load balancing strategies to parallelize pyrodigal over large amounts of files (MAGs) or FASTA files that have a large number of scaffolds (viruses).

Installation

Install versioned releases

pip install metapyrodigal

Install from source

git clone https://github.com/cody-mar10/metapyrodigal.git
cd metapyrodigal
pip install .

Usage

This tool will overwrite the pyrodigal binary, so you can use the metagenome-focused binary that I created.

The help page from pyrodigal -h looks like this:

usage: pyrodigal [-h] (-i FILE [FILE ...] | -d DIR) [-o DIR] [-c INT] [--genes] [--virus-mode] [-x STR]
                 [--allow-unordered]

Find ORFs from query genomes using pyrodigal v3.5.2, the cythonized prodigal API

options:
  -h, --help            show this help message and exit
  -i FILE [FILE ...], --input FILE [FILE ...]
                        fasta file(s) of query genomes (can use unix wildcards)
  -d DIR, --input-dir DIR
                        directory of fasta files to process
  -o DIR, --outdir DIR  output directory (default: /storage2/scratch/ccmartin6/software/metapyrodigal)
  -c INT, --max-cpus INT
                        maximum number of threads to use (default: 1)
  --genes               use to also output the nucleotide genes .ffn file
  --virus-mode          use pyrodigal-gv to activate the virus models (default: False)
  -x STR, --extension STR
                        genome FASTA file extension if using -d/--input-dir (default: fna)
  --allow-unordered     for a single file input, this allows the protein ORFs to be written per scaffold as
                        available. All protein ORFs for each scaffold will be in order, but the scaffolds will not
                        necessarily be in the same order as in the input nucleotide file. **This is useful if you
                        are extremely memory limited,** since the default strategy can lead to the ORFs being
                        stored in memory for awhile before writing to file as the original scaffold order is
                        maintained. NOTE: This is about 20 percent faster, so it is recommended to use this if the
                        order of scaffolds does not matter.

-i and -d are mutually exclusive but one of them must be provided.

The output files have the same basename as the input file. Protein FASTA files will have the extension .faa, and nucleotide gene FASTA files will have the extension .ffn. For example:

pyrodigal -i GENOME.fna

will output GENOME.faa

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metapyrodigal-1.4.0.tar.gz (9.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

metapyrodigal-1.4.0-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file metapyrodigal-1.4.0.tar.gz.

File metadata

  • Download URL: metapyrodigal-1.4.0.tar.gz
  • Upload date:
  • Size: 9.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for metapyrodigal-1.4.0.tar.gz
Algorithm Hash digest
SHA256 4f27e429744db6fa62c5653ab24addfaa7884898e52914b2c597684f2205be89
MD5 11a405e0689b54a66075eac50d4f7835
BLAKE2b-256 a198de464d97d6ba45d1bdeab641eddd84f475eb93110c84de140e5f97f7c007

See more details on using hashes here.

File details

Details for the file metapyrodigal-1.4.0-py3-none-any.whl.

File metadata

  • Download URL: metapyrodigal-1.4.0-py3-none-any.whl
  • Upload date:
  • Size: 10.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for metapyrodigal-1.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 38c3ea2a01069c6ea67f51caf615df24ec94346a890f7102a6e78ad3e94186e7
MD5 9547b51dff678ba9b4bb327c91459893
BLAKE2b-256 38b66e19baec113066ffc282ff091fba8a9fc773589d3f217110ae340f9809ca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page