Skip to main content

Yet Another Bioinformatics Utilities Library

Project description

Build Status

yabul

Yet Another Bioinformatics Utilities Library

This is a small collection of Python functions for working with protein, DNA, and RNA sequences. We use pandas data frames wherever possible.

Yabul currently supports:

  • Reading and writing FASTAs
  • Pairwise local and global sequence alignment (uses parasail)

Requires Python 3.6+.

Installation

Install using pip:

$ pip install yabul

You can run the unit from a checkout of the repo as follows:

$ pip install pytest
$ pytest

Example

Reading and writing FASTAs

The read_fasta function returns a pandas.DataFrame:

>>> import yabul
>>> df = yabul.read_fasta("test/data/cov2.fasta")
>>> df.head(3)
                                                             description                                           sequence
id
sp|P0DTC2|SPIKE_SARS2  sp|P0DTC2|SPIKE_SARS2 Spike glycoprotein OS=Se...  MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSS...
sp|P0DTD1|R1AB_SARS2   sp|P0DTD1|R1AB_SARS2 Replicase polyprotein 1ab...  MESLVPGFNEKTHVQLSLPVLQVRDVLVRGFGDSVEEVLSEARQHL...
sp|P0DTC1|R1A_SARS2    sp|P0DTC1|R1A_SARS2 Replicase polyprotein 1a O...  MESLVPGFNEKTHVQLSLPVLQVRDVLVRGFGDSVEEVLSEARQHL...

The write_fasta function takes (name, sequence) pairs:

>>> yabul.write_fasta("out.fasta", [("protein1", "TEST"), ("protein2", "HIHI")])
>>> yabul.write_fasta("out2.fasta", df.sequence.items())

Sequence alignment

The align_pair function will give a local (Smith-Waterman) and global (Needleman-Wunsch) alignment of two sequences. It returns a pandas.Series with the aligned sequences.

By default, the alignment is global:

>>> yabul.align_pair("AATESTDD", "TEST")
query             AATESTDD
reference         --TEST--
correspondence      ||||
score                   -5
dtype: object

To do a local alignment, pass local=True.

>>> yabul.align_pair("AATESTDD", "TEST", local=True)
query             TEST
reference         TEST
correspondence    ||||
score               19
dtype: object

Dependencies

The alignment routine is a thin wrapper around the Smith-Waterman and Needleman-Wunsch implementations from parasail.

Contributing

We welcome contributions of well-documented code to read and write common bioinformatics file formats using pandas objects. Please include unit tests in your PR. Additional functionality like multiple sequence alignment would also be nice to add.

Releasing

To push a new release to PyPI:

  • Make sure the package version specified in __init__.py is a new version greater than what's on PyPI.
  • Tag a new release on GitHub matching this version

Travis should deploy the release to PyPI automatically.

Documentation at https://yabul.readthedocs.io/en/latest/ should update automatically on commit.

To build the documentation locally, run:

$ cd docs
$ pip install -r requirements.txt
$ sphinx-build -b html . _build

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yabul-0.0.3.tar.gz (6.8 kB view details)

Uploaded Source

Built Distribution

yabul-0.0.3-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file yabul-0.0.3.tar.gz.

File metadata

  • Download URL: yabul-0.0.3.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/54.0.0 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.6.10

File hashes

Hashes for yabul-0.0.3.tar.gz
Algorithm Hash digest
SHA256 8c861e3774aa3e192df7d9e1647dc9b3750d29379cf243c5eb1880e6cc17647b
MD5 c614cc5b0b941475b6da40985713fafd
BLAKE2b-256 1dba2fda31d15d32b7cb5a612c9331cc3f37a49dc773f2628c5e5c6b54d946bd

See more details on using hashes here.

File details

Details for the file yabul-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: yabul-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 9.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/54.0.0 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.6.10

File hashes

Hashes for yabul-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f8de822f38f2ed957038898084aa4f3948ad3228690aace45131e17af257bcf3
MD5 e46b2f8bb337b1f076f2c9596e445c27
BLAKE2b-256 01decab054d9d0126707d6a9511d1128b552a5103936a9a7849162a3352ec1fa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page