Skip to main content

Pyrodigal cli optimized for metagenomic data

Project description

README

Introduction

This library is a simple wrapper of pyrodigal, which is a cythonized implementation of prodigal that is orders of magnitudes faster.

Pyrodigal is mostly written for single genomes or FASTA files, so this tool was created to batch process metagenomic-scale datasets. Metagenomic data usually consists of large number of genome files for MAGs. Additionally, viral metagenomic datasets tend to store all single-scaffold viruses in a single file, which tends to be much larger than a typical single-genome FASTA file.

This tool parallelizes pyrodigal over large amounts of files (MAGs) or FASTA files that have a large number of scaffolds (viruses).

Installation

Install versioned releases

pip install metapyrodigal

Install from source

git clone https://github.com/cody-mar10/metapyrodigal.git
cd metapyrodigal
pip install .

Usage

This tool will overwrite the pyrodigal binary, so you can use the metagenome-focused binary that I created.

The help page from pyrodigal -h looks like this:

usage: pyrodigal [-h] (-i FILE [FILE ...] | -d DIR) [-o DIR] [-c INT] [--genes]
                 [--virus-mode]

Find ORFs from query genomes using pyrodigal v3.5.2, the cythonized prodigal API

options:
  -h, --help            show this help message and exit
  -i FILE [FILE ...], --input FILE [FILE ...]
                        fasta file(s) of query genomes (can use unix wildcards)
  -d DIR, --input-dir DIR
                        directory of fasta files to process
  -o DIR, --outdir DIR  output directory (default: CWD)
  -c INT, --max-cpus INT
                        maximum number of threads to use (default: 1)
  --genes               use to also output the nucleotide genes .ffn file (default: False)
  --virus-mode          use pyrodigal-gv to activate the virus models (default: False)
  -x STR, --extension STR
                        genome FASTA file extension if using -d/--input-dir (default: fna)

-i and -d are mutually exclusive but one of them must be provided.

The output files have the same basename as the input file. Protein FASTA files will have the extension .faa, and nucleotide gene FASTA files will have the extension .ffn. For example:

pyrodigal -i GENOME.fna

will output GENOME.faa

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metapyrodigal-1.2.0.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

metapyrodigal-1.2.0-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file metapyrodigal-1.2.0.tar.gz.

File metadata

  • Download URL: metapyrodigal-1.2.0.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for metapyrodigal-1.2.0.tar.gz
Algorithm Hash digest
SHA256 6e4816db9b7e1fe74e2f5ed95f90338510ddc2de435879fe9c25ce277e7a09b8
MD5 a1b713b1f45f2a6960dc8a37c2b6a348
BLAKE2b-256 bbde9381f37f7aee3cd747ff83408c200610844c1b07ceb65c7311c102ef8ecf

See more details on using hashes here.

File details

Details for the file metapyrodigal-1.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for metapyrodigal-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 377cf6b03b01bf5ac8047e31201ed5921f5579a54b7ad9113452e86ca9630f03
MD5 0985043db755655d33d712f8939e78f3
BLAKE2b-256 61eb9b42fb2781acb99597433837ec9e488fe13d352bce093b199e1d55a1e93c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page