Python interface to Prodigal, an ORF finder for genomes, progenomes and metagenomes.
Project description
🔥 Pyrodigal
Python interface to Prodigal, an ORF finder for genomes, progenomes and metagenomes.
🗺️ Overview
Pyrodigal is a Python module that provides bindings to Prodigal using Cython. It directly interacts with the Prodigal internals, which has the following advantages:
- single dependency: Pyrodigal is distributed as a Python package, so you can add it as a dependency to your project, and stop worrying about the Prodigal binary being present on the end-user machine.
- no intermediate files: everything happens in memory, in a Python object you fully control, so you don't have to manually import and export sequences to pass to the Prodigal CLI.
- no input formatting: sequences are manipulated directly as strings, which leverages the issue of formatting your input to FASTA for Prodigal.
- lower memory usage: Pyrodigal is slightly more conservative when it comes to using memory, which can help process very large sequences.
📋 Features
The library now features everything needed to run Prodigal in metagenomic mode. It does not yet support single mode, which requires more configuration from the user but offers more flexibility.
Roadmap:
- Metagenomic mode
- Single mode
- External training file support (
-t
flag) - Region masking (
-m
flag)
📋 Memory
Contrary to the Prodigal command line, Pyrodigal attempts to be more conservative about memory usage. This means that most of the allocations will be lazy, and that some functions will reallocate their results to exact-sized arrays when it's possible. This leads to Pyrodigal using about 30% less memory, but with some more overhead
🧶 Thread-safety
pyrodigal.Pyrodigal
instances are not thread-safe: concurrent find_genes
calls will overwrite the internal memory used for dynamic programming and
could lead to unexpected crashes. A solution to process sequences in parallel
is to use a consumer/worker pattern, and have on Pyrodigal
instance in each
worker. Using a pool spawning Pyrodigal
instances on the fly is also fine,
but prevents recycling internal buffers:
with multiprocessing.pool.ThreadPool() as pool:
pool.map(lambda s: Pyrodigal(meta=True).find_genes(s), sequences)
💡 Example
Using Biopython, load a sequence from a GenBank file, use Prodigal to find all genes it contains, and print the proteins in FASTA format:
record = Bio.SeqIO.read("sequence.fa", "genbank")
p = pyrodigal.Pyrodigal(meta=True)
for i, gene in enumerate(p.find_genes(str(record.seq))):
print(f"> {record.id}_{i+1}")
print(textwrap.fill(record.translate()))
To use Pyrodigal
in single mode, you must explicitly call Pyrodigal.train
with the sequence you want to use for training before trying to find genes:
p = pyrodigal.Pyrodigal()
p.train(str(record.seq))
genes = p.find_genes(str(record.seq))
📜 License
This library, like the original Prodigal software, is provided under the GNU General Public License v3.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for pyrodigal-0.2.1-pp36-pypy3_72-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5af726f3c9870ab4ea36b933b839fb15716de6c528b858c2e64a27da06eab65c |
|
MD5 | df51dd412641ebe9c500f07a34888c7e |
|
BLAKE2b-256 | 5bafbbc0cf6bf27d9bb80b52bff8e9c236ebb0b092926b12a99f9bdadb4b0f07 |
Hashes for pyrodigal-0.2.1-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 914026e8aaa8f06a7fe3de9eabff1f68601c0529d3bc4a29732647a201c791f3 |
|
MD5 | 991efb773aef271b02b0d20e9ca02636 |
|
BLAKE2b-256 | 54624964adbda4c39426c404a105c29bfa2ae721588db19384ec9375b2dad1e9 |
Hashes for pyrodigal-0.2.1-cp38-cp38-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 16f2aa70e9da9c5ec0695b5aca345cfd4557bfa4d840cc875deaffa0ce37cf3e |
|
MD5 | 8392bf0706ff6f8c0b67fb04bf0ce1b1 |
|
BLAKE2b-256 | 9b89db324f7bf5f2b0bede7ac8bb13d900a5f8a7450994efb1ed9d7f8909240b |
Hashes for pyrodigal-0.2.1-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0255759a35c373feb097f623dc78426acc641a4590767e2a15d2939d3bf5845e |
|
MD5 | ff4cab74a126bde5debc21c27a357473 |
|
BLAKE2b-256 | 640763acfdb5a423710b65e49e14a2b6149f35fe2eb0f6b68b9428ca8e097742 |
Hashes for pyrodigal-0.2.1-cp37-cp37m-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ccd1d30bb6183b8519152108d0daa5e6bb9906736d9a5ba9c3e4ba541ac489c9 |
|
MD5 | 18a33360355c565429f83010e07602d1 |
|
BLAKE2b-256 | d597356f9a33937ffef061ff798606965fc062cb254fe67efa8301bf513d67af |
Hashes for pyrodigal-0.2.1-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 106d992aeaea9701e7ebf08cc583c6d66e029cb724204af0518fb8b204ad6611 |
|
MD5 | 59579a0941776f07991b3f9d5240a180 |
|
BLAKE2b-256 | 90f06bc46b027f94edd4437ef64eec11c72e339256368d08344d9d03db61113f |
Hashes for pyrodigal-0.2.1-cp36-cp36m-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b9de9651ab4bce8dc65af5b489b68c5e70183fc56a07a171dee8b98a512fa267 |
|
MD5 | a62d7a6e5bedf8b83f48ea69e9e9f857 |
|
BLAKE2b-256 | 9a79ae36bef675815bb5f7b170acbae4c76a90ccf2f563766add33f6d1336f4c |
Hashes for pyrodigal-0.2.1-cp35-cp35m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cabc8b98bf39eed142ee283ccc197388f524d8e26712cbfebc45211194497a8b |
|
MD5 | d883a9bb7af10fb81b35967946347cdd |
|
BLAKE2b-256 | 74b3e9a7b9d107ece2f3678f042591b4a160409f95df5208fc8abe3967286b5a |
Hashes for pyrodigal-0.2.1-cp35-cp35m-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c6b81e8f4463b477bf105ad46c3b4849b4549b2422babab714055e8901d4bef9 |
|
MD5 | 972a75edb476ca2d26e77eea64a71bc0 |
|
BLAKE2b-256 | 125ec722c0730831509d37d6ee225d8c1110f0c502e4f935acc24080ad43c7f5 |