Skip to main content

Yet Another Bioinformatics Utilities Library

Project description

Build Status

yabul

Yet Another Bioinformatics Utilities Library

This is a small collection of Python functions for working with protein, DNA, and RNA sequences. We use pandas data frames wherever possible.

Yabul currently supports:

  • Reading and writing FASTAs
  • Pairwise local and global sequence alignment (uses parasail)

Requires Python 3.6+.

Installation

Install using pip:

$ pip install yabul

You can run the unit from a checkout of the repo as follows:

$ pip install pytest
$ pytest

Example

Reading and writing FASTAs

The read_fasta function returns a pandas.DataFrame:

>>> import yabul
>>> df = yabul.read_fasta("test/data/cov2.fasta")
>>> df.head(3)
                                                             description                                           sequence
id
sp|P0DTC2|SPIKE_SARS2  sp|P0DTC2|SPIKE_SARS2 Spike glycoprotein OS=Se...  MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSS...
sp|P0DTD1|R1AB_SARS2   sp|P0DTD1|R1AB_SARS2 Replicase polyprotein 1ab...  MESLVPGFNEKTHVQLSLPVLQVRDVLVRGFGDSVEEVLSEARQHL...
sp|P0DTC1|R1A_SARS2    sp|P0DTC1|R1A_SARS2 Replicase polyprotein 1a O...  MESLVPGFNEKTHVQLSLPVLQVRDVLVRGFGDSVEEVLSEARQHL...

The write_fasta function takes (name, sequence) pairs:

>>> yabul.write_fasta("out.fasta", [("protein1", "TEST"), ("protein2", "HIHI")])
>>> yabul.write_fasta("out2.fasta", df.sequence.items())

Sequence alignment

The align_pair function will give a local (Smith-Waterman) and global (Needleman-Wunsch) alignment of two sequences. It returns a pandas.Series with the aligned sequences.

By default, the alignment is global:

>>> yabul.align_pair("AATESTDD", "TEST")
query             AATESTDD
reference         --TEST--
correspondence      ||||
score                   -5
dtype: object

To do a local alignment, pass local=True.

>>> yabul.align_pair("AATESTDD", "TEST", local=True)
query             TEST
reference         TEST
correspondence    ||||
score               19
dtype: object

Dependencies

The alignment routine is a thin wrapper around the Smith-Waterman and Needleman-Wunsch implementations from parasail.

Contributing

We welcome contributions of well-documented code to read and write common bioinformatics file formats using pandas objects. Please include unit tests in your PR. Additional functionality like multiple sequence alignment would also be nice to add.

Releasing

To push a new release to PyPI:

  • Make sure the package version specified in __init__.py is a new version greater than what's on PyPI.
  • Tag a new release on GitHub matching this version

Travis should deploy the release to PyPI automatically.

Documentation at https://yabul.readthedocs.io/en/latest/ should update automatically on commit.

To build the documentation locally, run:

$ cd docs
$ pip install -r requirements.txt
$ sphinx-build -b html . _build

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yabul-0.0.2.tar.gz (6.7 kB view details)

Uploaded Source

Built Distribution

yabul-0.0.2-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file yabul-0.0.2.tar.gz.

File metadata

  • Download URL: yabul-0.0.2.tar.gz
  • Upload date:
  • Size: 6.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/54.0.0 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.6.10

File hashes

Hashes for yabul-0.0.2.tar.gz
Algorithm Hash digest
SHA256 e8f1bb312e7ee8e11d4cd85c99b169eec6139cd4325a5738060448b4b0c3f4d1
MD5 e0230b7dfd9fe2eb2c7755f11ee62d69
BLAKE2b-256 16fd3825a0decc89a3621ffb681924628d3e9d9c2e237c84ee64e388c99369d0

See more details on using hashes here.

File details

Details for the file yabul-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: yabul-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 9.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/54.0.0 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.9.1

File hashes

Hashes for yabul-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8a95bd7520c85548678d802a71c0f0436a0397729dc461d183e7d2c46b60074c
MD5 4cd7c3bee8dcb82e194b02a8215ff3a4
BLAKE2b-256 a76ea9eea146998eb9d7de1e9314c6ea532ecbfced430da8b263f590a0a52990

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page