Skip to main content

A Python interface to gb-io, a fast GenBank parser written in Rust.

Project description

🧬🏦 gb-io.py Stars

A Python interface to gb-io, a fast GenBank parser written in Rust.

Actions Coverage License PyPI Wheel Python Versions Python Implementations Source Mirror GitHub issues Changelog Downloads

🗺️ Overview

gb-io.py is a Python package that provides an interface to gb-io, a very fast GenBank format parser implemented in Rust. It can reach much higher speed than the Biopython or the scikit-bio parsers.

This library has no external dependency and is available for all modern Python versions (3.7+).

🔧 Installing

Install the gb-io package directly from PyPi which hosts pre-compiled wheels that can be installed with pip:

$ pip install gb-io

Wheels are provided for the following platforms:

  • Linux, CPython, x86-64
  • Linux, CPython, Aarch64
  • MacOS, CPython, x86-64
  • MacOS, PyPy, x86-64
  • Windows, CPython, x86-64
  • Windows, PyPy, x86-64

Otherwise, the source distribution will be downloaded, and a local copy of the Rust compiler will be downloaded to build the package, unless it is already installed on the host machine.

💡 Usage

Use the gb_io.load function to obtain a list of all GenBank records in a file:

records = gb_io.load("tests/data/AY048670.1.gb")

Reading from a file-like object is supported as well, both in text and binary mode:

with open("tests/data/AY048670.1.gb") as file:
    records = gb_io.load(file)

It is also possible to iterate over each record in the file without having to load the entirety of the file contents to memory with the gb_io.iter method, which returns an iterator instead of a list:

for record in gb_io.iter("tests/data/AY048670.1.gb"):
    print(record.name, record.sequence[:10])

You can use the gb_io.dump method to write one or more records to a file (either given as a path, or a file-like handle):

with open("tests/data/AY048670.1.gb", "wb") as file:
    gb_io.dump(records, file)

📝 Example

The following small script will extract all the CDS features from a GenBank file, and write them in FASTA format to an output file:

import gb_io

with open("tests/data/AY048670.1.faa", "w") as dst:
    for record in gb_io.iter("tests/data/AY048670.1.gb"):
        for feature in filter(lambda feat: feat.type == "CDS", record.features):
            qualifiers = feature.qualifiers.to_dict()
            dst.write(">{}\n".format(qualifiers["locus_tag"][0]))
            dst.write("{}\n".format(qualifiers["translation"][0]))

Compared to similar implementations using Bio.SeqIO.parse, Bio.GenBank.parse and Bio.GenBank.Scanner.GenBankScanner.parse_cds_features, the performance is the following:

gb_io.iter GenBankScanner GenBank.parse SeqIO.parse
Time (s) 2.264 7.982 15.259 19.351
Speed (MiB/s) 136.5 37.1 20.5 16.2
Speedup x8.55 x2.42 x1.27 -

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

🏗️ Contributing

Contributions are more than welcome! See CONTRIBUTING.md for more details.

⚖️ License

This library is provided under the MIT License. The gb-io Rust crate package was written by David Leslie and is licensed under the terms of the MIT License. This package vendors the source of several additional packages that are licensed under the Apache-2.0, MIT or BSD-3-Clause licenses; see the license file distributed with the source copy of each vendored dependency for more information.

This project is in no way not affiliated, sponsored, or otherwise endorsed by the original gb-io authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gb-io-0.2.0.tar.gz (18.9 MB view details)

Uploaded Source

Built Distributions

gb_io-0.2.0-pp39-pypy39_pp73-win_amd64.whl (275.4 kB view details)

Uploaded PyPy Windows x86-64

gb_io-0.2.0-pp39-pypy39_pp73-macosx_10_9_x86_64.whl (372.5 kB view details)

Uploaded PyPy macOS 10.9+ x86-64

gb_io-0.2.0-pp38-pypy38_pp73-win_amd64.whl (275.2 kB view details)

Uploaded PyPy Windows x86-64

gb_io-0.2.0-pp38-pypy38_pp73-macosx_10_9_x86_64.whl (372.5 kB view details)

Uploaded PyPy macOS 10.9+ x86-64

gb_io-0.2.0-pp37-pypy37_pp73-win_amd64.whl (276.1 kB view details)

Uploaded PyPy Windows x86-64

gb_io-0.2.0-pp37-pypy37_pp73-macosx_10_9_x86_64.whl (373.7 kB view details)

Uploaded PyPy macOS 10.9+ x86-64

gb_io-0.2.0-cp310-cp310-win_amd64.whl (274.9 kB view details)

Uploaded CPython 3.10 Windows x86-64

gb_io-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (429.2 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

gb_io-0.2.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (432.4 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

gb_io-0.2.0-cp310-cp310-macosx_10_15_x86_64.whl (373.1 kB view details)

Uploaded CPython 3.10 macOS 10.15+ x86-64

gb_io-0.2.0-cp39-cp39-win_amd64.whl (275.1 kB view details)

Uploaded CPython 3.9 Windows x86-64

gb_io-0.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (429.3 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

gb_io-0.2.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (432.6 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

gb_io-0.2.0-cp39-cp39-macosx_10_15_x86_64.whl (373.2 kB view details)

Uploaded CPython 3.9 macOS 10.15+ x86-64

gb_io-0.2.0-cp38-cp38-win_amd64.whl (275.7 kB view details)

Uploaded CPython 3.8 Windows x86-64

gb_io-0.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (428.6 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

gb_io-0.2.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (432.5 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

gb_io-0.2.0-cp38-cp38-macosx_10_15_x86_64.whl (373.3 kB view details)

Uploaded CPython 3.8 macOS 10.15+ x86-64

gb_io-0.2.0-cp37-cp37m-win_amd64.whl (275.7 kB view details)

Uploaded CPython 3.7m Windows x86-64

gb_io-0.2.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (428.9 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

gb_io-0.2.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (432.8 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ARM64

gb_io-0.2.0-cp37-cp37m-macosx_10_15_x86_64.whl (373.3 kB view details)

Uploaded CPython 3.7m macOS 10.15+ x86-64

File details

Details for the file gb-io-0.2.0.tar.gz.

File metadata

  • Download URL: gb-io-0.2.0.tar.gz
  • Upload date:
  • Size: 18.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for gb-io-0.2.0.tar.gz
Algorithm Hash digest
SHA256 a8daf3ae104c0eb46b5134be5cc0ec3447ba5494d38a10e6d24562cbf2be3400
MD5 89873d5ad8babfe9783caf847e67f2d1
BLAKE2b-256 57a64231aad49b8daba8cc841a3d3eb72496e278f09f13b6b2d4334c7cab233d

See more details on using hashes here.

Provenance

File details

Details for the file gb_io-0.2.0-pp39-pypy39_pp73-win_amd64.whl.

File metadata

File hashes

Hashes for gb_io-0.2.0-pp39-pypy39_pp73-win_amd64.whl
Algorithm Hash digest
SHA256 c130ef8872aa70a074e91351836a46ad8836da8b1605149cfc8606639ab7d5e6
MD5 3e7825ebc6ae673ce355ef54fb85a033
BLAKE2b-256 7e0e000091c914b1f98bc96f4eb1fa38cc4e36b4a594bafdb82c1effabbac711

See more details on using hashes here.

Provenance

File details

Details for the file gb_io-0.2.0-pp39-pypy39_pp73-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for gb_io-0.2.0-pp39-pypy39_pp73-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 60eddd9ae576d49eed9165de58bc1df085d6b6c672f2d171d74ed761314443e8
MD5 973681a49a8ed871c84093e36436af32
BLAKE2b-256 98f673fa0cf441dcfefc5c81771d2fdc48e9b3da13a569e01a77b54e97c4d7e5

See more details on using hashes here.

Provenance

File details

Details for the file gb_io-0.2.0-pp38-pypy38_pp73-win_amd64.whl.

File metadata

File hashes

Hashes for gb_io-0.2.0-pp38-pypy38_pp73-win_amd64.whl
Algorithm Hash digest
SHA256 1290bd019a888ffdf0d286d0559fa8b373b8401821fccc24e60f85d7cdde74e4
MD5 ccf2d67f42e7eb7d9af9532814a2e2b4
BLAKE2b-256 d66b4a76ed59bba4bd432f10866170f022123942a73170874952f65b42312084

See more details on using hashes here.

Provenance

File details

Details for the file gb_io-0.2.0-pp38-pypy38_pp73-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for gb_io-0.2.0-pp38-pypy38_pp73-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 f7446567f9121017841635bc94a1d1cb8a31bc1a43769ada12c434d36559a6cf
MD5 97e2e388a906605ebe173a7e02b8e035
BLAKE2b-256 9d889f06f6ed68ca03e9d0219ef7830c1696412c966c3a1e4e4e217803338bdf

See more details on using hashes here.

Provenance

File details

Details for the file gb_io-0.2.0-pp37-pypy37_pp73-win_amd64.whl.

File metadata

File hashes

Hashes for gb_io-0.2.0-pp37-pypy37_pp73-win_amd64.whl
Algorithm Hash digest
SHA256 9bc42dab1e615bd3b404e5bf8161645c35b8960c9e778eebecf27314d8668216
MD5 337a228c441f908ab0813c33b4875225
BLAKE2b-256 c8d9a3bc7cbb41da8b1bf5cf597a37cdfe29ca318128dcb4207abc7aee35dd1b

See more details on using hashes here.

Provenance

File details

Details for the file gb_io-0.2.0-pp37-pypy37_pp73-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for gb_io-0.2.0-pp37-pypy37_pp73-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 5687618b616392c83d516b4837f558a04fc335eee9ad4a8768ac866e1ee87009
MD5 052af75896f36a34efc45fadd0287d88
BLAKE2b-256 dd9a20a34d55f1fa4ddb9a50f6886caef588a7f7b63521a8ac364c049e5b7260

See more details on using hashes here.

Provenance

File details

Details for the file gb_io-0.2.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: gb_io-0.2.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 274.9 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for gb_io-0.2.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 b9bb4f14c30423b1ccd326c6436ef616ecbc002689be9b0a3df305ce5b67a656
MD5 6d26d72cc2b99c143ade781484c68cd6
BLAKE2b-256 a10349a344b8789bf3fd1bcdbc615d0b8e6f1e81f8ffd662d357622410e1aded

See more details on using hashes here.

Provenance

File details

Details for the file gb_io-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for gb_io-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f1a5c0bf4c6ff8c7942ffd1c609771de51f1b6be30627c3679f04a296167a4cd
MD5 02051694bc6a0d78b085942ba0ceb264
BLAKE2b-256 e7ba2956130196c4f84e4618bbb8892736ac483eac1bc678a215d3372a50fe25

See more details on using hashes here.

Provenance

File details

Details for the file gb_io-0.2.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for gb_io-0.2.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 6b299ca0172d0f18bf7b42a212338f35682cd4119a0a6e51b5878b07e7373ec9
MD5 752a0666928d99516ffbcea589295bee
BLAKE2b-256 36c56e20db3b7e5d85f70d25af3ce75e64ed6380173d1690c11998208ff78566

See more details on using hashes here.

Provenance

File details

Details for the file gb_io-0.2.0-cp310-cp310-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for gb_io-0.2.0-cp310-cp310-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 89a988b34f9aae58c6b94e635b26108594d1e0fd2ab9c32a801d96f27cf22609
MD5 39d6ca300dcc8a701bf614784340ea4e
BLAKE2b-256 d773dc22c31f2c093bd57158f45e4cb534c27d2cf3702ecfb2724a0c17c21b18

See more details on using hashes here.

Provenance

File details

Details for the file gb_io-0.2.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: gb_io-0.2.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 275.1 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for gb_io-0.2.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 6ea45f2ec2ffcbfff084c8439556eaa668b49310de9ab7cfe3ae0e9e945f1ad4
MD5 30b80bffc3ff56b3541e2768451a30b4
BLAKE2b-256 fa3ec309ee208ba1a794a6da81ceabbf36f12d756d2b0d1ef2e6e084842c5091

See more details on using hashes here.

Provenance

File details

Details for the file gb_io-0.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for gb_io-0.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 648ff21184beced8fa934d4a9dab9d04324452428424d34eaa8d809dd1db41e9
MD5 7f4c58fbadf72bdc88c4baab933274db
BLAKE2b-256 390c7ab1356c4c76961c2dd35fde073c0eb8cd7476e6506a3cd159a9116dfb5f

See more details on using hashes here.

Provenance

File details

Details for the file gb_io-0.2.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for gb_io-0.2.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 f046e4085384c2702ab0a6292a99f5c152a9421ecb0369655c2fdaa3feb4965d
MD5 7e49ba12e7d092597e7b40e7665b8bf1
BLAKE2b-256 39f45fd1aa5b16ff9d4bc2443e986b7e57a3887e20371586498214e8868c2cb7

See more details on using hashes here.

Provenance

File details

Details for the file gb_io-0.2.0-cp39-cp39-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for gb_io-0.2.0-cp39-cp39-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 4b4e9ceecb926f524973e4fd17a23849884818fa68e1928f27400a44a5e4d0de
MD5 6dd16f4c8d8ef93e3dd5fdeae8503b6f
BLAKE2b-256 252d62535cf8311b560c47c10d1f96a5b4b840ca3f819bede8c1c9493905ee7c

See more details on using hashes here.

Provenance

File details

Details for the file gb_io-0.2.0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: gb_io-0.2.0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 275.7 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for gb_io-0.2.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 58671dfc37e928b8b4a5783a6ef51219eaf363012d6f697de92ffd75d69917c5
MD5 7c9476149030ca75e786594b48f3e5ad
BLAKE2b-256 e561043707ec8f4e3d4d0bdd8595f1361c824aaa18ab6122cbf0ad976602b920

See more details on using hashes here.

Provenance

File details

Details for the file gb_io-0.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for gb_io-0.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a5a529c95064c8fd236255fa4517c4d4937fa97940be767d6b247e96077e5e7b
MD5 6b3cef11af6116817569b56d60640b1b
BLAKE2b-256 c2cc021b19488ebd62b3f621427bcdee43574037d68ae596952ce82556f7b8df

See more details on using hashes here.

Provenance

File details

Details for the file gb_io-0.2.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for gb_io-0.2.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 3e6b84272d42c593ac1aafc42b40b67627b04fa8f13eb0af5baa8c6962952766
MD5 452212c81140f648fa315559d9373201
BLAKE2b-256 386209291a2234543e7423e10ef87c8a930a78797748b943119fd29a6c26bb99

See more details on using hashes here.

Provenance

File details

Details for the file gb_io-0.2.0-cp38-cp38-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for gb_io-0.2.0-cp38-cp38-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 fd586b5c1dc22585f85ced4e07b120b7fd4c2a1cacfd6f6a42de5872d8c8a154
MD5 75e6b4966cec4f2cb0d09681347aca8e
BLAKE2b-256 a6eb684d4731afc5aa629e3342a062b751bdae9b1251269383d8d2ded47ed647

See more details on using hashes here.

Provenance

File details

Details for the file gb_io-0.2.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: gb_io-0.2.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 275.7 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for gb_io-0.2.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 9e850c10fb2ae4511728135c8d932456915c5725337e064a3344d47aaf49929e
MD5 fa0a9e8d9f314d85ff9ff7dfcb691975
BLAKE2b-256 e51bdbe737d7294f2a8ef08d09add87e709f8f5d281505f24acaee3a6ea099d6

See more details on using hashes here.

Provenance

File details

Details for the file gb_io-0.2.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for gb_io-0.2.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ac3b3ea7adec9c9bc41654cffe606a56ec6fac3eabffca94f7de640a858f477d
MD5 5da3704e5e1260a06fbfd846644d0f39
BLAKE2b-256 b9703b732f81c88faa8b0b5caf86c589a6eb471ad6973fffb45ebec876683a66

See more details on using hashes here.

Provenance

File details

Details for the file gb_io-0.2.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for gb_io-0.2.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 c8968c8403ede4cd840492b91622886b7fed1175bfeec0484096b34949db860c
MD5 fe97f2cf5ed44610c143670222573b5d
BLAKE2b-256 63cd660da83d77300c0db1037ca9dbe01cf2794df49929472cfb160101f6f055

See more details on using hashes here.

Provenance

File details

Details for the file gb_io-0.2.0-cp37-cp37m-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for gb_io-0.2.0-cp37-cp37m-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 dbac1ff0547ed54bc568b79b25f2f334b4d19de2a077a884cd991bf90f3dac66
MD5 a8839a93f6f92b77d030da6ffdcc1e57
BLAKE2b-256 349422168b7223fea79c431b80058bd72048480a907dc757c8c1f57a923afcff

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page