Pyrodigal cli optimized for metagenomic data
Project description
README
Introduction
This library is a simple wrapper of pyrodigal, which is a cythonized implementation of prodigal that is orders of magnitudes faster.
Pyrodigal is mostly written for single genomes or FASTA files, so this tool was created to batch process metagenomic-scale datasets. Metagenomic data usually consists of large number of genome files for MAGs. Additionally, viral metagenomic datasets tend to store all single-scaffold viruses in a single file, which tends to be much larger than a typical single-genome FASTA file.
This tool uses different load balancing strategies to parallelize pyrodigal over large amounts of files (MAGs) or FASTA files that have a large number of scaffolds (viruses).
Installation
Install versioned releases
pip install metapyrodigal
Install from source
git clone https://github.com/cody-mar10/metapyrodigal.git
cd metapyrodigal
pip install .
Usage
This tool will overwrite the pyrodigal binary, so you can use the metagenome-focused binary that I created.
The help page from pyrodigal -h looks like this:
usage: pyrodigal [-h] (-i FILE [FILE ...] | -d DIR) [-o DIR] [-c INT] [--genes] [--virus-mode] [-x STR]
[--allow-unordered]
Find ORFs from query genomes using pyrodigal v3.5.2, the cythonized prodigal API
options:
-h, --help show this help message and exit
-i FILE [FILE ...], --input FILE [FILE ...]
fasta file(s) of query genomes (can use unix wildcards)
-d DIR, --input-dir DIR
directory of fasta files to process
-o DIR, --outdir DIR output directory (default: /storage2/scratch/ccmartin6/software/metapyrodigal)
-c INT, --max-cpus INT
maximum number of threads to use (default: 1)
--genes use to also output the nucleotide genes .ffn file
--virus-mode use pyrodigal-gv to activate the virus models (default: False)
-x STR, --extension STR
genome FASTA file extension if using -d/--input-dir (default: fna)
--allow-unordered for a single file input, this allows the protein ORFs to be written per scaffold as
available. All protein ORFs for each scaffold will be in order, but the scaffolds will not
necessarily be in the same order as in the input nucleotide file. **This is useful if you
are extremely memory limited,** since the default strategy can lead to the ORFs being
stored in memory for awhile before writing to file as the original scaffold order is
maintained. NOTE: This is about 20 percent faster, so it is recommended to use this if the
order of scaffolds does not matter.
-i and -d are mutually exclusive but one of them must be provided.
The output files have the same basename as the input file. Protein FASTA files will have the extension .faa, and nucleotide gene FASTA files will have the extension .ffn. For example:
pyrodigal -i GENOME.fna
will output GENOME.faa
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file metapyrodigal-1.4.0.tar.gz.
File metadata
- Download URL: metapyrodigal-1.4.0.tar.gz
- Upload date:
- Size: 9.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f27e429744db6fa62c5653ab24addfaa7884898e52914b2c597684f2205be89
|
|
| MD5 |
11a405e0689b54a66075eac50d4f7835
|
|
| BLAKE2b-256 |
a198de464d97d6ba45d1bdeab641eddd84f475eb93110c84de140e5f97f7c007
|
File details
Details for the file metapyrodigal-1.4.0-py3-none-any.whl.
File metadata
- Download URL: metapyrodigal-1.4.0-py3-none-any.whl
- Upload date:
- Size: 10.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
38c3ea2a01069c6ea67f51caf615df24ec94346a890f7102a6e78ad3e94186e7
|
|
| MD5 |
9547b51dff678ba9b4bb327c91459893
|
|
| BLAKE2b-256 |
38b66e19baec113066ffc282ff091fba8a9fc773589d3f217110ae340f9809ca
|