Skip to main content

A Python interface to gb-io, a fast GenBank parser and serializer written in Rust.

Project description

🧬🏦 gb-io.py Stars

A Python interface to gb-io, a fast GenBank parser and serializer written in Rust.

Actions Coverage License PyPI Bioconda AUR Wheel Python Versions Python Implementations Source Mirror GitHub issues Changelog Downloads Docs

🗺️ Overview

gb-io.py is a Python package that provides an interface to gb-io, a very fast GenBank format parser implemented in Rust by David Leslie. It can reach much higher speed than the Biopython or the scikit-bio parsers.

This library has no external dependency and is available for all modern Python versions (3.7+).

To improve performance, the library implements a copy-on-access pattern, so that data is only copied on the Python heap when it is actually being accessed, rather than on object creation. For instance, if the consumer of the parser only requires the GenBank features and not the record sequence, the sequence will not be copied to a Python bytes object.

🔧 Installing

Install the gb-io package directly from PyPi which hosts pre-compiled wheels that can be installed with pip:

$ pip install gb-io

Wheels are provided for common platforms, such as x86-64 Linux, Windows and MacOS, as well as Aarch64 Linux and MacOS. If no wheel is available, the source distribution will be downloaded, and a local copy of the Rust compiler will be downloaded to build the package, unless it is already installed on the host machine.

📖 Documentation

A complete API reference can be found in the online documentation, or directly from the command line using pydoc:

$ pydoc gb_io

💡 Usage

Use the gb_io.load function to obtain a list of all GenBank records in a file:

records = gb_io.load("tests/data/AY048670.1.gb")

Reading from a file-like object is supported as well, both in text and binary mode:

with open("tests/data/AY048670.1.gb") as file:
    records = gb_io.load(file)

It is also possible to iterate over each record in the file without having to load the entirety of the file contents to memory with the gb_io.iter method, which returns an iterator instead of a list:

for record in gb_io.iter("tests/data/AY048670.1.gb"):
    print(record.name, record.sequence[:10])

You can use the gb_io.dump method to write one or more records to a file (either given as a path, or a file-like handle):

with open("tests/data/AY048670.1.gb", "wb") as file:
    gb_io.dump(records, file)

📝 Example

The following small script will extract all the CDS features from a GenBank file, and write them in FASTA format to an output file:

import gb_io

with open("tests/data/AY048670.1.faa", "w") as dst:
    for record in gb_io.iter("tests/data/AY048670.1.gb"):
        for feature in filter(lambda feat: feat.kind == "CDS", record.features):
            qualifiers = {q.key:q.value for q in feature.qualifiers}
            dst.write(">{}\n".format(qualifiers["protein_id"][0]))
            dst.write("{}\n".format(qualifiers["translation"][0]))

Compared to similar implementations using Bio.SeqIO.parse, Bio.GenBank.parse and Bio.GenBank.Scanner.GenBankScanner.parse_cds_features, the performance is the following:

gb_io.iter GenBankScanner GenBank.parse SeqIO.parse
Time (s) 2.264 7.982 15.259 19.351
Speed (MiB/s) 136.5 37.1 20.5 16.2
Speedup x8.55 x2.42 x1.27 -

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

🏗️ Contributing

Contributions are more than welcome! See CONTRIBUTING.md for more details.

⚖️ License

This library is provided under the MIT License. The gb-io Rust crate package was written by David Leslie and is licensed under the terms of the MIT License. This package vendors the source of several additional packages that are licensed under the Apache-2.0, MIT or BSD-3-Clause licenses; see the license file distributed with the source copy of each vendored dependency for more information.

This project is in no way not affiliated, sponsored, or otherwise endorsed by the original gb-io authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gb_io-0.4.0.tar.gz (2.6 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

gb_io-0.4.0-cp37-abi3-win_amd64.whl (392.8 kB view details)

Uploaded CPython 3.7+Windows x86-64

gb_io-0.4.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (551.6 kB view details)

Uploaded CPython 3.7+manylinux: glibc 2.17+ x86-64

gb_io-0.4.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (536.5 kB view details)

Uploaded CPython 3.7+manylinux: glibc 2.17+ ARM64

gb_io-0.4.0-cp37-abi3-macosx_12_0_x86_64.whl (519.0 kB view details)

Uploaded CPython 3.7+macOS 12.0+ x86-64

gb_io-0.4.0-cp37-abi3-macosx_11_0_arm64.whl (495.5 kB view details)

Uploaded CPython 3.7+macOS 11.0+ ARM64

File details

Details for the file gb_io-0.4.0.tar.gz.

File metadata

  • Download URL: gb_io-0.4.0.tar.gz
  • Upload date:
  • Size: 2.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gb_io-0.4.0.tar.gz
Algorithm Hash digest
SHA256 29ecc93467964cd8dd5303756d282ed20f895938c8997aa45e97314c782bcfcd
MD5 9cd36f3e1a89c643307e8da1a294c844
BLAKE2b-256 94146ca621eb62afde6c439f2d5728f3222e517846b14858675fa2b2b15b672e

See more details on using hashes here.

Provenance

The following attestation bundles were made for gb_io-0.4.0.tar.gz:

Publisher: package.yml on althonos/gb-io.py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gb_io-0.4.0-cp37-abi3-win_amd64.whl.

File metadata

  • Download URL: gb_io-0.4.0-cp37-abi3-win_amd64.whl
  • Upload date:
  • Size: 392.8 kB
  • Tags: CPython 3.7+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gb_io-0.4.0-cp37-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 2717f24ca0c64fbc4d902865e1a217d14bc57053014411f2477b2a8e520367ee
MD5 bf14bc8a8df5bc455443cfd7f5b45786
BLAKE2b-256 2fd8461902c3862fd805d1ca67a83529fd2da21451e73ad6e4fb36f0553909cd

See more details on using hashes here.

Provenance

The following attestation bundles were made for gb_io-0.4.0-cp37-abi3-win_amd64.whl:

Publisher: package.yml on althonos/gb-io.py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gb_io-0.4.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for gb_io-0.4.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1d9ab1d5ae13ead33818fa1a791f6513b6af87767466ff2003203f1ef98107b4
MD5 a9013895cf3a648aff9406a46502f7ef
BLAKE2b-256 c92cdd866cb8575bf433580c6bdbdb1baaab9690ef6e78e4227c3d9fbb96fc4d

See more details on using hashes here.

Provenance

The following attestation bundles were made for gb_io-0.4.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: package.yml on althonos/gb-io.py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gb_io-0.4.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for gb_io-0.4.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 4528a40c02f0150a579716a228a34f5248bb1c75e3a9a5eb4d493be4612a7a12
MD5 3a90c821db9cba2f745ed3503d70f06e
BLAKE2b-256 fcc41e84daeb21b90bb24a285542a87707876e8c47e195334e80f515fe4b5184

See more details on using hashes here.

Provenance

The following attestation bundles were made for gb_io-0.4.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: package.yml on althonos/gb-io.py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gb_io-0.4.0-cp37-abi3-macosx_12_0_x86_64.whl.

File metadata

  • Download URL: gb_io-0.4.0-cp37-abi3-macosx_12_0_x86_64.whl
  • Upload date:
  • Size: 519.0 kB
  • Tags: CPython 3.7+, macOS 12.0+ x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gb_io-0.4.0-cp37-abi3-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 2192cd60ffa2c7c98cc1d5b2e87f8889a14079443d2299061835bbb92ce86d96
MD5 c60af38e4e58d8cf664ec5a2180daf91
BLAKE2b-256 dc5efa33622278b0e6bb2d82d12508857b80ea548383fbec8cadbd773a4ba74e

See more details on using hashes here.

Provenance

The following attestation bundles were made for gb_io-0.4.0-cp37-abi3-macosx_12_0_x86_64.whl:

Publisher: package.yml on althonos/gb-io.py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gb_io-0.4.0-cp37-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: gb_io-0.4.0-cp37-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 495.5 kB
  • Tags: CPython 3.7+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gb_io-0.4.0-cp37-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 bc8434953630ca5b548d96cb975f6bcfc64f313127a3c8ecfb13e38f5771189d
MD5 5a62866cda93b87a8a8023434069462c
BLAKE2b-256 dc6e984988c01b2ecc46f17b86e06a9b58af9b00e660010ee404822ba1121272

See more details on using hashes here.

Provenance

The following attestation bundles were made for gb_io-0.4.0-cp37-abi3-macosx_11_0_arm64.whl:

Publisher: package.yml on althonos/gb-io.py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page