Cython bindings and Python interface to Prodigal, an ORF finder for genomes and metagenomes.
Project description
🔥 Pyrodigal
Cython bindings and Python interface to Prodigal, an ORF finder for genomes and metagenomes. Now with SIMD!
🗺️ Overview
Pyrodigal is a Python module that provides bindings to Prodigal using Cython. It directly interacts with the Prodigal internals, which has the following advantages:
- single dependency: Pyrodigal is distributed as a Python package, so you can add it as a dependency to your project, and stop worrying about the Prodigal binary being present on the end-user machine.
- no intermediate files: Everything happens in memory, in a Python object you fully control, so you don't have to invoke the Prodigal CLI using a sub-process and temporary files. Sequences can be passed directly as strings or bytes, which avoids the overhead of formatting your input to FASTA for Prodigal.
- lower memory usage: Pyrodigal is slightly more conservative when it comes to using memory, which can help process very large sequences. It also lets you save some more memory when running several meta-mode analyses
- better performance: Pyrodigal uses SIMD instructions to compute which dynamic programming nodes can be ignored when scoring connections. This can save from a third to half the runtime depending on the sequence.
📋 Features
The library now features everything needed to run Prodigal in single or metagenomic mode. It is still missing some features of the CLI:
Roadmap:
- ✔️ Metagenomic mode
- ✔️ Single mode
- ❌ External training file support (
-t
flag) - ❌ Region masking (
-m
flag)
🐏 Memory
Pyrodigal makes two changes compared to the original Prodigal command line:
- Sequences are stored as raw bytes instead of compressed bitmaps. This means that the sequence itself takes 3/8th more space, but since the memory used for storing the sequence is often negligible compared to the memory used to store dynamic programming nodes, this is an acceptable trade-off for better performance when finding the start and stop nodes.
- Node arrays are dynamically allocated and grow exponentially instead of being pre-allocated with a large size. On small sequences, this leads to Pyrodigal using about 30% less memory.
🧶 Thread-safety
pyrodigal.Pyrodigal
instances are thread-safe. In addition, the find_genes
method is re-entrant. This means you can train a Pyrodigal
instance once,
and then use a pool to process sequences in parallel:
p = Pyrodigal()
p.train(training_sequence)
with multiprocessing.pool.ThreadPool() as pool:
predictions = pool.map(p.find_genes, sequences)
🔧 Installing
Pyrodigal can be installed directly from PyPI, which hosts some pre-built wheels for the x86-64 architecture (Linux/OSX/Windows) and the Aarch64 architecture (Linux only), as well as the code required to compile from source with Cython:
$ pip install pyrodigal
Otherwise, Pyrodigal is also available as a Bioconda package:
$ conda install -c bioconda pyrodigal
💡 Example
Lets load a sequence from a
GenBank file, use Pyrodigal
to find all the genes it contains, and print the proteins in two-line FASTA
format.
🔬 Biopython
To use Pyrodigal
in single mode, you must explicitly call Pyrodigal.train
with the sequence you want to use for training before trying to find genes,
or you will get a RuntimeError
:
p = pyrodigal.Pyrodigal()
p.train(bytes(record.seq))
genes = p.find_genes(bytes(record.seq))
However, in meta
mode, you can find genes directly:
record = Bio.SeqIO.read("sequence.gbk", "genbank")
p = pyrodigal.Pyrodigal(meta=True)
for i, gene in enumerate(p.find_genes(bytes(record.seq))):
print(f">{record.id}_{i+1}")
print(record.translate())
On older versions of Biopython (before 1.79) you will need to use
record.seq.encode()
instead of bytes(record.seq)
.
🧪 Scikit-bio
seq = next(skbio.io.read("sequence.gbk", "genbank"))
p = pyrodigal.Pyrodigal(meta=True)
for i, gene in enumerate(p.find_genes(seq.values.view('B'))):
print(f">{record.id}_{i+1}")
print(record.translate())
We need to use the view
method to get the sequence viewable by Cython as an array of unsigned char
.
💭 Feedback
⚠️ Issue Tracker
Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.
🏗️ Contributing
Contributions are more than welcome! See
CONTRIBUTING.md
for more details.
⚖️ License
This library is provided under the GNU General Public License v3.0.
The Prodigal code was written by Doug Hyatt and is distributed under the
terms of the GPLv3 as well. See vendor/Prodigal/LICENSE
for more information. The cpu_features
library is
licensed under the terms of the Apache license. See vendor/cpu_features/LICENSE
for more information.
This project is in no way not affiliated, sponsored, or otherwise endorsed by the original Prodigal authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for pyrodigal-0.6.0-pp37-pypy37_pp73-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 63a08b10c80f8222a1c2dfe4bd1cd1a3aae63957129181e77e4f343b7cc7c513 |
|
MD5 | e23498c265a36cedc78ce70ede17d629 |
|
BLAKE2b-256 | 83fabaa232fa4c96700da9bf02b61c5b1ed896568cc1c93e16e20b75f57133b7 |
Hashes for pyrodigal-0.6.0-pp37-pypy37_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c87b199f73806eb59594fbf285cb0b94f98d4c25e35814e8436579161734b9b0 |
|
MD5 | 0bc6d1c2ed087f7924c61cdd1cf876de |
|
BLAKE2b-256 | 197d3bfeed06dbf3374742ade0b29993de0e90c2579ff001e2db34d5660db0d3 |
Hashes for pyrodigal-0.6.0-pp37-pypy37_pp73-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 022195b5eb1427a5abb30c5dff9713d1793503fdca2471bf2d8f4bdf3e9e2cec |
|
MD5 | 824e8fb2daa60e95111f8dea5b47cda6 |
|
BLAKE2b-256 | dc530a8b4c8f7f88e8fed98b36deef3ed06c6c3b836a996472ac4b503d496b33 |
Hashes for pyrodigal-0.6.0-pp36-pypy36_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 747d0d1b6f587ad5154b824229a8bd6867eb2491ba7dc7f6f6cbcb7e98756250 |
|
MD5 | 192a9cd1b397745d48a5c73d2b787cb4 |
|
BLAKE2b-256 | beccadf9c63cacb96d7207f6914ab6cf45c2838c72476316609d1e1b8d818057 |
Hashes for pyrodigal-0.6.0-pp36-pypy36_pp73-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e5be90a2d379f626f0c6865c6b5eb77404af613911e28bcf016dfed3dc5ed60b |
|
MD5 | 191473bd16867eaf318d8fde3245f838 |
|
BLAKE2b-256 | b0002b6bc7f2e7788544cd87bf067ee330f62d043b50151356db98e45c76ab81 |
Hashes for pyrodigal-0.6.0-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bdb3365bfa94fcf66874c3e6e21021cb7ecfcfdde31c5e71e34f982b75001214 |
|
MD5 | e4aaad2efba1baa8305407e0a1cc8b5b |
|
BLAKE2b-256 | 8d4edabf3c833caa4aeefe8bbc5af61def3a3dc85a472e1540f7fafb1e619e01 |
Hashes for pyrodigal-0.6.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8005080fc4d69c3906549a0f8fa3769854ac26904aea467c5cea33d0146cd38e |
|
MD5 | 7f87b9da28aa53ccb88e5be0cc526bca |
|
BLAKE2b-256 | f62a437ad84654730da5b8a66068702507af46909945c49a6888f231addc2407 |
Hashes for pyrodigal-0.6.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7df6d6eca61fc7c80ef53678b53c1103ec9ac2fc2d3b6e20d013a44b6a17eadb |
|
MD5 | 57ebf8f1d00cf43d7f5e068c5756df3d |
|
BLAKE2b-256 | ce0118d5a525a19a51c4185418133a8ec6114ae044d93322cb745bd6a02aaa34 |
Hashes for pyrodigal-0.6.0-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 85dbaaddcd72e7c76c780373fc650e54963132052cc930fafeb172d048ed0047 |
|
MD5 | 0a435ff8bab31ea4db01fc529882f34b |
|
BLAKE2b-256 | 8e0b9cfde8f6a132ab47f5ae05f7a6a2aec1a35f473eeabecc00b2ef484e5bd0 |
Hashes for pyrodigal-0.6.0-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c62ff4201cf2d395afc66689113926291cea177e243f4335dedf10cfd400ae62 |
|
MD5 | dd59de37baa06dce242689bbeffa23f0 |
|
BLAKE2b-256 | 228e5e45fcec898eafc4ae02dc28863b26afc0b0e0fff51e64a8e33fd826580c |
Hashes for pyrodigal-0.6.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 046da886c6043e4b5b3e399334a0ccf954783ddc52eb990543110370476a587a |
|
MD5 | 1d9f1a0500a0aa72ba267db5978069cb |
|
BLAKE2b-256 | d03494588d682ef93c5c1d46e6dcbf034c4fae16a9b88bb1ea61e7d55d2cc8bd |
Hashes for pyrodigal-0.6.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2601c3953497faa22068a2ef83db965c8008914c22067b63d0a15e65ba745c83 |
|
MD5 | bb4ed6251e5dcbaca94bae9771ee86c4 |
|
BLAKE2b-256 | e889d24f877f96fd24d16e89a03d12a784f8dbaf091d91946cbaa7cdd9b52923 |
Hashes for pyrodigal-0.6.0-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5e9de0dccb8c566486984d0dc7c0d5b3827508fcb4b7b057d70bc51040165c56 |
|
MD5 | 45ccde0734852c8eddaba6339c723187 |
|
BLAKE2b-256 | 61bc1350cac0700195560310f560edd65c4daca19f85aa81b7cb4adb55e9d1c0 |
Hashes for pyrodigal-0.6.0-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d528646a284611d7f2d582cf8b8085d3b63606d06a34c8b5977cb3af54289099 |
|
MD5 | 8911997b45c762f77e556da68397de9f |
|
BLAKE2b-256 | 5dbb75eb079445e1873b6662bd1661c03a35ec2a6ce94e63d08229c32d64254a |
Hashes for pyrodigal-0.6.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 98c15fe53b0ab529e728c1bbbba172f2ebb6f66050424ece2b46a9b844173695 |
|
MD5 | 4773cb274598d536ee1f19329b4eb9b1 |
|
BLAKE2b-256 | fc4babcb229f492eaefc46a8626f702a5682ea68ea03209ae82a0ff7edd5f2df |
Hashes for pyrodigal-0.6.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | de6e062efc25133a83be6af81fcdf32a2ac9c58d604c43e92bf70e6ad4ea75ea |
|
MD5 | f4ed31373a34b10376df711a7ee32894 |
|
BLAKE2b-256 | 7250e6acc04cf521f6dc9e725bae3d64ce154c7c6b2a63b4391079fd2a0efe9f |
Hashes for pyrodigal-0.6.0-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a768487d3bd5d80a6198018766639ca4b145f1b0c718b66c8457aa75eb63a4c1 |
|
MD5 | 48c6a6ff843c9d5be74e1f636f063c65 |
|
BLAKE2b-256 | 8c95893ee02144f55d867d8ed4bd8528a6d06903012abf1711bbc11a4d78c289 |
Hashes for pyrodigal-0.6.0-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d775d2956a15316183d7e4955e1511710c08ee343e638677e7438de3b1a5f4c7 |
|
MD5 | 268379c4305a2da3bea42e53dc51c882 |
|
BLAKE2b-256 | 9f4caa585b09aa5f173b58c7d0ee225378938920dee977c638754bd355ec5607 |
Hashes for pyrodigal-0.6.0-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a12ea8e0fed563cbea280a57bc0ba79842e15cb2ea74f4216c4e329dabec4c66 |
|
MD5 | 5269bbffcc3437e18c9ad632c6f716a6 |
|
BLAKE2b-256 | a9ead479f22c5a0dfed4151db3f975face1e95fd049fe656f9c169d472f9b817 |
Hashes for pyrodigal-0.6.0-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9d1ca81365e6ff38d66f1c22b9ab68ed96023b9a2510e509bc5210bcb56c0f8d |
|
MD5 | eb30c1174a5aed563622edb13b375520 |
|
BLAKE2b-256 | 2e40e6c422389543ca104fd0f3c5b060a91a6f4a949a9c0cddf31cc43263cd83 |
Hashes for pyrodigal-0.6.0-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 80ce1312694335155fd0e91ff56373e5b0bc31a4fd27b9d8777f8ea59fb0d17e |
|
MD5 | 3e00c1e40f26b6e8081785df71c9b628 |
|
BLAKE2b-256 | f8b29bffa4de75bf54de5cc55def64c275bf44d7c8122eef343c4119a9b54149 |
Hashes for pyrodigal-0.6.0-cp35-cp35m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eed444440f5eabdc59f1cec2c09f8a79d6a770fd62122331626c4dce1d580ad7 |
|
MD5 | 1b497cff5cad38ab56a323cdf84f8b79 |
|
BLAKE2b-256 | c7034f91837769c66893ca5306469ffbf8ba7ff8400d7b13425558196549b31a |
Hashes for pyrodigal-0.6.0-cp35-cp35m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 87f716244e08632ba835160866ccf5c9fcd8cbfda539f13ac98baa048539991f |
|
MD5 | 803b9da000ef699bcb53141371da150d |
|
BLAKE2b-256 | cdbc3b647e7192f6f3c91fabb0d74c9fa004dbd7f19030d770126d959c51052c |
Hashes for pyrodigal-0.6.0-cp35-cp35m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 46e412c254afda457dd4eed23ee6aacf5842f93792d9a3dfcdcebded27db00f7 |
|
MD5 | 0d31585fc0e81e7ffc6aa0e5e0d05370 |
|
BLAKE2b-256 | 0870a264c06535a2b25a957abe0cbcb37904fa4946e014f72f6eacd0c30bcf0a |