A Python interface to gb-io, a fast GenBank parser written in Rust.
Project description
🧬🏦 gb-io.py
A Python interface to gb-io
, a fast GenBank parser written in Rust.
🗺️ Overview
gb-io.py
is a Python package that provides an interface to gb-io
, a very
fast GenBank format parser implemented in Rust. It can reach much higher
speed than the Biopython or
the scikit-bio parsers.
This library has no external dependency and is available for all modern Python versions (3.7+).
🔧 Installing
Install the gb-io
package directly from PyPi
which hosts pre-compiled wheels that can be installed with pip
:
$ pip install gb-io
Wheels are provided for the following platforms:
- Linux, CPython, x86-64
- Linux, CPython, Aarch64
- MacOS, CPython, x86-64
- MacOS, PyPy, x86-64
- Windows, CPython, x86-64
- Windows, PyPy, x86-64
Otherwise, the source distribution will be downloaded, and a local copy of the Rust compiler will be downloaded to build the package, unless it is already installed on the host machine.
💡 Usage
Use the gb_io.load
function to obtain a list of all GenBank records in a file:
records = gb_io.load("tests/data/AY048670.1.gb")
Reading from a file-like object is supported as well, both in text and binary mode:
with open("tests/data/AY048670.1.gb") as file:
records = gb_io.load(file)
It is also possible to iterate over each record in the file without having
to load the entirety of the file contents to memory with the gb_io.iter
method, which returns an iterator instead of a list:
for record in gb_io.iter("tests/data/AY048670.1.gb"):
print(record.name, record.sequence[:10])
You can use the gb_io.dump
method to write one or more records to a file
(either given as a path, or a file-like handle):
with open("tests/data/AY048670.1.gb", "wb") as file:
gb_io.dump(records, file)
📝 Example
The following small script will extract all the CDS features from a GenBank file, and write them in FASTA format to an output file:
import gb_io
with open("tests/data/AY048670.1.faa", "w") as dst:
for record in gb_io.iter("tests/data/AY048670.1.gb"):
for feature in filter(lambda feat: feat.type == "CDS", record.features):
qualifiers = feature.qualifiers.to_dict()
dst.write(">{}\n".format(qualifiers["locus_tag"][0]))
dst.write("{}\n".format(qualifiers["translation"][0]))
Compared to similar implementations using Bio.SeqIO.parse
, Bio.GenBank.parse
and Bio.GenBank.Scanner.GenBankScanner.parse_cds_features
, the performance is
the following:
gb_io.iter |
GenBankScanner |
GenBank.parse |
SeqIO.parse |
|
---|---|---|---|---|
Time (s) | 2.264 | 7.982 | 15.259 | 19.351 |
Speed (MiB/s) | 136.5 | 37.1 | 20.5 | 16.2 |
Speedup | x8.55 | x2.42 | x1.27 | - |
💭 Feedback
⚠️ Issue Tracker
Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.
🏗️ Contributing
Contributions are more than welcome! See
CONTRIBUTING.md
for more details.
⚖️ License
This library is provided under the MIT License.
The gb-io
Rust crate package was written by David Leslie
and is licensed under the terms of the MIT License.
This package vendors the source of several additional packages that are
licensed under the Apache-2.0,
MIT or
BSD-3-Clause licenses;
see the license file distributed with the source copy of each vendored
dependency for more information.
This project is in no way not affiliated, sponsored, or otherwise endorsed
by the original gb-io
authors. It was developed
by Martin Larralde during his PhD project
at the European Molecular Biology Laboratory in
the Zeller team.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file gb-io-0.2.0.tar.gz
.
File metadata
- Download URL: gb-io-0.2.0.tar.gz
- Upload date:
- Size: 18.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a8daf3ae104c0eb46b5134be5cc0ec3447ba5494d38a10e6d24562cbf2be3400 |
|
MD5 | 89873d5ad8babfe9783caf847e67f2d1 |
|
BLAKE2b-256 | 57a64231aad49b8daba8cc841a3d3eb72496e278f09f13b6b2d4334c7cab233d |
Provenance
File details
Details for the file gb_io-0.2.0-pp39-pypy39_pp73-win_amd64.whl
.
File metadata
- Download URL: gb_io-0.2.0-pp39-pypy39_pp73-win_amd64.whl
- Upload date:
- Size: 275.4 kB
- Tags: PyPy, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c130ef8872aa70a074e91351836a46ad8836da8b1605149cfc8606639ab7d5e6 |
|
MD5 | 3e7825ebc6ae673ce355ef54fb85a033 |
|
BLAKE2b-256 | 7e0e000091c914b1f98bc96f4eb1fa38cc4e36b4a594bafdb82c1effabbac711 |
Provenance
File details
Details for the file gb_io-0.2.0-pp39-pypy39_pp73-macosx_10_9_x86_64.whl
.
File metadata
- Download URL: gb_io-0.2.0-pp39-pypy39_pp73-macosx_10_9_x86_64.whl
- Upload date:
- Size: 372.5 kB
- Tags: PyPy, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60eddd9ae576d49eed9165de58bc1df085d6b6c672f2d171d74ed761314443e8 |
|
MD5 | 973681a49a8ed871c84093e36436af32 |
|
BLAKE2b-256 | 98f673fa0cf441dcfefc5c81771d2fdc48e9b3da13a569e01a77b54e97c4d7e5 |
Provenance
File details
Details for the file gb_io-0.2.0-pp38-pypy38_pp73-win_amd64.whl
.
File metadata
- Download URL: gb_io-0.2.0-pp38-pypy38_pp73-win_amd64.whl
- Upload date:
- Size: 275.2 kB
- Tags: PyPy, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1290bd019a888ffdf0d286d0559fa8b373b8401821fccc24e60f85d7cdde74e4 |
|
MD5 | ccf2d67f42e7eb7d9af9532814a2e2b4 |
|
BLAKE2b-256 | d66b4a76ed59bba4bd432f10866170f022123942a73170874952f65b42312084 |
Provenance
File details
Details for the file gb_io-0.2.0-pp38-pypy38_pp73-macosx_10_9_x86_64.whl
.
File metadata
- Download URL: gb_io-0.2.0-pp38-pypy38_pp73-macosx_10_9_x86_64.whl
- Upload date:
- Size: 372.5 kB
- Tags: PyPy, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f7446567f9121017841635bc94a1d1cb8a31bc1a43769ada12c434d36559a6cf |
|
MD5 | 97e2e388a906605ebe173a7e02b8e035 |
|
BLAKE2b-256 | 9d889f06f6ed68ca03e9d0219ef7830c1696412c966c3a1e4e4e217803338bdf |
Provenance
File details
Details for the file gb_io-0.2.0-pp37-pypy37_pp73-win_amd64.whl
.
File metadata
- Download URL: gb_io-0.2.0-pp37-pypy37_pp73-win_amd64.whl
- Upload date:
- Size: 276.1 kB
- Tags: PyPy, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9bc42dab1e615bd3b404e5bf8161645c35b8960c9e778eebecf27314d8668216 |
|
MD5 | 337a228c441f908ab0813c33b4875225 |
|
BLAKE2b-256 | c8d9a3bc7cbb41da8b1bf5cf597a37cdfe29ca318128dcb4207abc7aee35dd1b |
Provenance
File details
Details for the file gb_io-0.2.0-pp37-pypy37_pp73-macosx_10_9_x86_64.whl
.
File metadata
- Download URL: gb_io-0.2.0-pp37-pypy37_pp73-macosx_10_9_x86_64.whl
- Upload date:
- Size: 373.7 kB
- Tags: PyPy, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5687618b616392c83d516b4837f558a04fc335eee9ad4a8768ac866e1ee87009 |
|
MD5 | 052af75896f36a34efc45fadd0287d88 |
|
BLAKE2b-256 | dd9a20a34d55f1fa4ddb9a50f6886caef588a7f7b63521a8ac364c049e5b7260 |
Provenance
File details
Details for the file gb_io-0.2.0-cp310-cp310-win_amd64.whl
.
File metadata
- Download URL: gb_io-0.2.0-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 274.9 kB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b9bb4f14c30423b1ccd326c6436ef616ecbc002689be9b0a3df305ce5b67a656 |
|
MD5 | 6d26d72cc2b99c143ade781484c68cd6 |
|
BLAKE2b-256 | a10349a344b8789bf3fd1bcdbc615d0b8e6f1e81f8ffd662d357622410e1aded |
Provenance
File details
Details for the file gb_io-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: gb_io-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 429.2 kB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f1a5c0bf4c6ff8c7942ffd1c609771de51f1b6be30627c3679f04a296167a4cd |
|
MD5 | 02051694bc6a0d78b085942ba0ceb264 |
|
BLAKE2b-256 | e7ba2956130196c4f84e4618bbb8892736ac483eac1bc678a215d3372a50fe25 |
Provenance
File details
Details for the file gb_io-0.2.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: gb_io-0.2.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 432.4 kB
- Tags: CPython 3.10, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6b299ca0172d0f18bf7b42a212338f35682cd4119a0a6e51b5878b07e7373ec9 |
|
MD5 | 752a0666928d99516ffbcea589295bee |
|
BLAKE2b-256 | 36c56e20db3b7e5d85f70d25af3ce75e64ed6380173d1690c11998208ff78566 |
Provenance
File details
Details for the file gb_io-0.2.0-cp310-cp310-macosx_10_15_x86_64.whl
.
File metadata
- Download URL: gb_io-0.2.0-cp310-cp310-macosx_10_15_x86_64.whl
- Upload date:
- Size: 373.1 kB
- Tags: CPython 3.10, macOS 10.15+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 89a988b34f9aae58c6b94e635b26108594d1e0fd2ab9c32a801d96f27cf22609 |
|
MD5 | 39d6ca300dcc8a701bf614784340ea4e |
|
BLAKE2b-256 | d773dc22c31f2c093bd57158f45e4cb534c27d2cf3702ecfb2724a0c17c21b18 |
Provenance
File details
Details for the file gb_io-0.2.0-cp39-cp39-win_amd64.whl
.
File metadata
- Download URL: gb_io-0.2.0-cp39-cp39-win_amd64.whl
- Upload date:
- Size: 275.1 kB
- Tags: CPython 3.9, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6ea45f2ec2ffcbfff084c8439556eaa668b49310de9ab7cfe3ae0e9e945f1ad4 |
|
MD5 | 30b80bffc3ff56b3541e2768451a30b4 |
|
BLAKE2b-256 | fa3ec309ee208ba1a794a6da81ceabbf36f12d756d2b0d1ef2e6e084842c5091 |
Provenance
File details
Details for the file gb_io-0.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: gb_io-0.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 429.3 kB
- Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 648ff21184beced8fa934d4a9dab9d04324452428424d34eaa8d809dd1db41e9 |
|
MD5 | 7f4c58fbadf72bdc88c4baab933274db |
|
BLAKE2b-256 | 390c7ab1356c4c76961c2dd35fde073c0eb8cd7476e6506a3cd159a9116dfb5f |
Provenance
File details
Details for the file gb_io-0.2.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: gb_io-0.2.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 432.6 kB
- Tags: CPython 3.9, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f046e4085384c2702ab0a6292a99f5c152a9421ecb0369655c2fdaa3feb4965d |
|
MD5 | 7e49ba12e7d092597e7b40e7665b8bf1 |
|
BLAKE2b-256 | 39f45fd1aa5b16ff9d4bc2443e986b7e57a3887e20371586498214e8868c2cb7 |
Provenance
File details
Details for the file gb_io-0.2.0-cp39-cp39-macosx_10_15_x86_64.whl
.
File metadata
- Download URL: gb_io-0.2.0-cp39-cp39-macosx_10_15_x86_64.whl
- Upload date:
- Size: 373.2 kB
- Tags: CPython 3.9, macOS 10.15+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4b4e9ceecb926f524973e4fd17a23849884818fa68e1928f27400a44a5e4d0de |
|
MD5 | 6dd16f4c8d8ef93e3dd5fdeae8503b6f |
|
BLAKE2b-256 | 252d62535cf8311b560c47c10d1f96a5b4b840ca3f819bede8c1c9493905ee7c |
Provenance
File details
Details for the file gb_io-0.2.0-cp38-cp38-win_amd64.whl
.
File metadata
- Download URL: gb_io-0.2.0-cp38-cp38-win_amd64.whl
- Upload date:
- Size: 275.7 kB
- Tags: CPython 3.8, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 58671dfc37e928b8b4a5783a6ef51219eaf363012d6f697de92ffd75d69917c5 |
|
MD5 | 7c9476149030ca75e786594b48f3e5ad |
|
BLAKE2b-256 | e561043707ec8f4e3d4d0bdd8595f1361c824aaa18ab6122cbf0ad976602b920 |
Provenance
File details
Details for the file gb_io-0.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: gb_io-0.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 428.6 kB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a5a529c95064c8fd236255fa4517c4d4937fa97940be767d6b247e96077e5e7b |
|
MD5 | 6b3cef11af6116817569b56d60640b1b |
|
BLAKE2b-256 | c2cc021b19488ebd62b3f621427bcdee43574037d68ae596952ce82556f7b8df |
Provenance
File details
Details for the file gb_io-0.2.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: gb_io-0.2.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 432.5 kB
- Tags: CPython 3.8, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e6b84272d42c593ac1aafc42b40b67627b04fa8f13eb0af5baa8c6962952766 |
|
MD5 | 452212c81140f648fa315559d9373201 |
|
BLAKE2b-256 | 386209291a2234543e7423e10ef87c8a930a78797748b943119fd29a6c26bb99 |
Provenance
File details
Details for the file gb_io-0.2.0-cp38-cp38-macosx_10_15_x86_64.whl
.
File metadata
- Download URL: gb_io-0.2.0-cp38-cp38-macosx_10_15_x86_64.whl
- Upload date:
- Size: 373.3 kB
- Tags: CPython 3.8, macOS 10.15+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fd586b5c1dc22585f85ced4e07b120b7fd4c2a1cacfd6f6a42de5872d8c8a154 |
|
MD5 | 75e6b4966cec4f2cb0d09681347aca8e |
|
BLAKE2b-256 | a6eb684d4731afc5aa629e3342a062b751bdae9b1251269383d8d2ded47ed647 |
Provenance
File details
Details for the file gb_io-0.2.0-cp37-cp37m-win_amd64.whl
.
File metadata
- Download URL: gb_io-0.2.0-cp37-cp37m-win_amd64.whl
- Upload date:
- Size: 275.7 kB
- Tags: CPython 3.7m, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9e850c10fb2ae4511728135c8d932456915c5725337e064a3344d47aaf49929e |
|
MD5 | fa0a9e8d9f314d85ff9ff7dfcb691975 |
|
BLAKE2b-256 | e51bdbe737d7294f2a8ef08d09add87e709f8f5d281505f24acaee3a6ea099d6 |
Provenance
File details
Details for the file gb_io-0.2.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: gb_io-0.2.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 428.9 kB
- Tags: CPython 3.7m, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac3b3ea7adec9c9bc41654cffe606a56ec6fac3eabffca94f7de640a858f477d |
|
MD5 | 5da3704e5e1260a06fbfd846644d0f39 |
|
BLAKE2b-256 | b9703b732f81c88faa8b0b5caf86c589a6eb471ad6973fffb45ebec876683a66 |
Provenance
File details
Details for the file gb_io-0.2.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: gb_io-0.2.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 432.8 kB
- Tags: CPython 3.7m, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c8968c8403ede4cd840492b91622886b7fed1175bfeec0484096b34949db860c |
|
MD5 | fe97f2cf5ed44610c143670222573b5d |
|
BLAKE2b-256 | 63cd660da83d77300c0db1037ca9dbe01cf2794df49929472cfb160101f6f055 |
Provenance
File details
Details for the file gb_io-0.2.0-cp37-cp37m-macosx_10_15_x86_64.whl
.
File metadata
- Download URL: gb_io-0.2.0-cp37-cp37m-macosx_10_15_x86_64.whl
- Upload date:
- Size: 373.3 kB
- Tags: CPython 3.7m, macOS 10.15+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dbac1ff0547ed54bc568b79b25f2f334b4d19de2a077a884cd991bf90f3dac66 |
|
MD5 | a8839a93f6f92b77d030da6ffdcc1e57 |
|
BLAKE2b-256 | 349422168b7223fea79c431b80058bd72048480a907dc757c8c1f57a923afcff |