Skip to main content

Naturalis BioInformatics ToolKit

Project description

nbitk: Naturalis BioInformatics ToolKit

This project is intended as a foundational toolkit for bioinformatics research at Naturalis Biodiversity Center. The toolkit is written in Python and is designed to be easy to use and easy to extend.

The Big Idea

BioPython has good support for a lot of bioinformatics use cases. For example, for reading and writing standard DNA sequence file formats, Bio.SeqIO is the go-to solution. However, we at Naturalis also operate on formats that are not supported by biopython, such as BCDM, or spreadsheets, or JSON files, and we have a need for more attributes on certain object than provided by BioPython. Rather than reinventing the wheel for those formats and passing around non-standard sequence objects (e.g. a dict or whatever), the sensible thing would be to build up a common toolkit that extends biopython for those use cases.

This toolkit is intended to provide such functionality. It is not intended to replace BioPython, but to extend it. It is meant to be stable and relatively lightweight, so we mean to be quite hesitant to add features that are not likely to be needed for a lot of use cases. As some examples for use cases that we do want to support:

  • Reading and writing taxonomic trees from formats not yet supported by BioPython. For example: DarwinCore representations such as the Dutch species register; taxonomic trees implied by custom FASTA headers; taxonomic lineages and trees produced by various (web) services for taxonomic name resolution.
  • Reading and writing sequences from formats not yet supported by BioPython. For example: JSON files produced by barcoding and metabarcoding pipelines.
  • Interactions with services within the Naturalis architecture, such as S3 buckets, DataBricks, LIMS, Galaxy, etc.

However, anything that is likely to require a lot of maintenance to stay up to date (such as wrappers around tools that are frequently updated) is probably not a good fit for this toolkit.

Who is this for?

The principal users and developers of this toolkit are the bioinformaticians at Naturalis Biodiversity Center. Design changes and feature requests should therefore be discussed with the bioinformatics team.

Installation

The toolkit is intended to be released on PyPI and can be installed using pip:

pip install nbitk

To minimize dependency hell, the general idea is not to release the toolkit itself on conda with loads of dependencies. Rather, we will release the toolkit on PyPI and let the user install the dependencies they need themselves. This way, we can keep the toolkit lightweight and easy to install.

Usage

The toolkit is meant for programmatic use. It is not intended to be used as a command line tool. Consult the various modules and classes for documentation on how to use the toolkit. In addition, the scripts in the tests directory provide examples of how to use the toolkit.

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nbitk-0.2.5.tar.gz (5.4 MB view details)

Uploaded Source

Built Distribution

nbitk-0.2.5-py3-none-any.whl (5.4 MB view details)

Uploaded Python 3

File details

Details for the file nbitk-0.2.5.tar.gz.

File metadata

  • Download URL: nbitk-0.2.5.tar.gz
  • Upload date:
  • Size: 5.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for nbitk-0.2.5.tar.gz
Algorithm Hash digest
SHA256 3f56d696d697f09329dec698c06b27b6a05b3e1399a1cc875cad16b4a2b6514d
MD5 fc91a481e4bb2c6249ee2e606aaa44a5
BLAKE2b-256 5adf478b2f97465694e56e2bc11b324518909ad5dc824da0171bd20bc0dd4dcf

See more details on using hashes here.

File details

Details for the file nbitk-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: nbitk-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 5.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for nbitk-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 0c735011a13680eb558db9ddb16c4c4c8ba83e6fa9b3aafaa5b2f4fec7fd5c4e
MD5 9716164c8f2663551654bb1839b14dd7
BLAKE2b-256 9c65fb21018eaad9bb6cd7f1284f0fcea58d3e02e060235e23fe7d278992f64c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page