Yet Another Bioinformatics Utilities Library
Project description
yabul
Yet Another Bioinformatics Utilities Library
This is a small collection of Python functions for working with protein, DNA, and RNA sequences. We use pandas data frames wherever possible.
Yabul currently supports:
- Reading and writing FASTAs
- Pairwise local and global sequence alignment (uses parasail)
Requires Python 3.6+.
Installation
Install using pip:
$ pip install yabul
You can run the unit from a checkout of the repo as follows:
$ pip install pytest
$ pytest
Example
Reading and writing FASTAs
The read_fasta
function returns a pandas.DataFrame
:
>>> import yabul
>>> df = yabul.read_fasta("test/data/cov2.fasta")
>>> df.head(3)
description sequence
id
sp|P0DTC2|SPIKE_SARS2 sp|P0DTC2|SPIKE_SARS2 Spike glycoprotein OS=Se... MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSS...
sp|P0DTD1|R1AB_SARS2 sp|P0DTD1|R1AB_SARS2 Replicase polyprotein 1ab... MESLVPGFNEKTHVQLSLPVLQVRDVLVRGFGDSVEEVLSEARQHL...
sp|P0DTC1|R1A_SARS2 sp|P0DTC1|R1A_SARS2 Replicase polyprotein 1a O... MESLVPGFNEKTHVQLSLPVLQVRDVLVRGFGDSVEEVLSEARQHL...
The write_fasta function takes (name, sequence) pairs:
>>> yabul.write_fasta("out.fasta", [("protein1", "TEST"), ("protein2", "HIHI")])
>>> yabul.write_fasta("out2.fasta", df.sequence.items())
Sequence alignment
The align_pair function will give a local (Smith-Waterman) and global (Needleman-Wunsch) alignment of two sequences. It returns a pandas.Series with the aligned sequences.
By default, the alignment is global:
>>> yabul.align_pair("AATESTDD", "TEST")
query AATESTDD
reference --TEST--
correspondence ||||
score -5
dtype: object
To do a local alignment, pass local=True
.
>>> yabul.align_pair("AATESTDD", "TEST", local=True)
query TEST
reference TEST
correspondence ||||
score 19
dtype: object
Dependencies
The alignment routine is a thin wrapper around the Smith-Waterman and Needleman-Wunsch implementations from parasail.
Contributing
We welcome contributions of well-documented code to read and write common bioinformatics file formats using pandas objects. Please include unit tests in your PR. Additional functionality like multiple sequence alignment would also be nice to add.
Releasing
To push a new release to PyPI:
- Make sure the package version specified in
__init__.py
is a new version greater than what's on PyPI. - Tag a new release on GitHub matching this version
Travis should deploy the release to PyPI automatically.
Documentation at https://yabul.readthedocs.io/en/latest/ should update automatically on commit.
To build the documentation locally, run:
$ cd docs
$ pip install -r requirements.txt
$ sphinx-build -b html . _build
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file yabul-0.0.2.tar.gz
.
File metadata
- Download URL: yabul-0.0.2.tar.gz
- Upload date:
- Size: 6.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/54.0.0 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.6.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e8f1bb312e7ee8e11d4cd85c99b169eec6139cd4325a5738060448b4b0c3f4d1 |
|
MD5 | e0230b7dfd9fe2eb2c7755f11ee62d69 |
|
BLAKE2b-256 | 16fd3825a0decc89a3621ffb681924628d3e9d9c2e237c84ee64e388c99369d0 |
File details
Details for the file yabul-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: yabul-0.0.2-py3-none-any.whl
- Upload date:
- Size: 9.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/54.0.0 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a95bd7520c85548678d802a71c0f0436a0397729dc461d183e7d2c46b60074c |
|
MD5 | 4cd7c3bee8dcb82e194b02a8215ff3a4 |
|
BLAKE2b-256 | a76ea9eea146998eb9d7de1e9314c6ea532ecbfced430da8b263f590a0a52990 |