Skip to main content

bio

Project description

bio: making bioinformatics fun again

The software is currently under development. It is operational but not fully vetted.

bio - command-line utilities to make bioinformatics explorations more enjoyable.

Full documentation:

Why do we need this software?

If you've ever done bioinformatics you know how even seemingly straightforward tasks require multiple steps, arcane incantations, reading documentation and numerous other preparations that slow down your progress.

Time and again I found myself not pursuing an idea because getting to the fun part was too tedious. The bio package is meant to solve that tedium. With bio you can write things like this:

# Fetch the data from NCBI.
bio NC_045512 --fetch --rename ncov
bio MN996532  --fetch --rename ratg13

# Align the DNA for the S protein.
bio ncov:S ratg13:S --end 90 --align

to align the first 90 basepairs of the DNA sequence of the S protein from the SARS-COV-2 novel coronavirus to its closest (known) relative, the bat coronavirus RaTG13. The command above will print:

### 1: YP_009724390 vs QHR63300.2 ###

Length: 90 (semiglobal)
Query:  90 [1, 90]
Target: 90 [1, 90]
Score:  387
Ident:  83/90 (92.2%)
Simil:  83/90 (92.2%)
Gaps:   0/90 (0.0%)
Matrix: nuc44(-11, -1)

YP_009724390 ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGCATACACTAAT
           1 ||||||||||||||||||||||||||||||||.||||||||||||||||||||.|||||.||||||||.|||||.|||||||||||.||. 90
QHR63300.2   ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTTTCTAGTCAGTGTGTTAATCTAACAACTAGAACTCAGTTACCTCCTGCATACACCAAC

If you wanted to align the same sequences as translated proteins bio lets you write:

bio ncov:S ratg13:S --end 90 --translate --align

to generate:

### 1: YP_009724390 vs QHR63300.2 ###

Length: 30 (semiglobal)
Query:  30 [1, 30]
Target: 30 [1, 30]
Score:  153
Ident:  30/30 (100.0%)
Simil:  30/30 (100.0%)
Gaps:   0/30 (0.0%)
Matrix: blosum62(-11, -1)

YP_009724390 MFVFLVLLPLVSSQCVNLTTRTQLPPAYTN
           1 |||||||||||||||||||||||||||||| 30
QHR63300.2   MFVFLVLLPLVSSQCVNLTTRTQLPPAYTN

Beyond alignments there is a lot more to bio and we recommend looking at the documentation

Who is bio designed for?

The software was written to teach bioinformatics and is the companion software to the Biostar Handbook textbook. The targeted audience comprises:

  • Students learning about bioinformatics.
  • Bioinformatics educators that need a platform to demonstrate bioinformatics concepts.
  • Scientists working with large numbers of similar genomes (bacterial/viral strains).
  • Scientists that need to closely investigate and understand particular details of a genomic region.

The ideas and motivations fueling the creation of bio came to us while educating the many cohorts of students that used the handbook in the classrom.

You see, in bioinformatics, many tasks that should be straightforward are, instead, needlessly complicated. bio is an opinionated take on how bioinformatics, particularly data presentation and access, should be simplified.

Documentation

The documentation is maintained at

Quick install

bio works on Linux and Mac computers and on Windows when using the Linux Subsystem. Install the package with:

# We recommend installing prerequisites with conda.
conda install -c bioconda biopython parasail-python

# Install the bio package.
pip install bio --upgrade

See more details in the documentation.

Development

If you clone the repository we recommend to install as development package with:

python setup.py develop

Testing

Testing uses the pytest framework:

pip install pytest

To run all tests use:

make test

Tests are automatically built from a test script that mimics real life usage scenarios.

New tests

To add a new test first run the command you wish to test, for example:

bio foo --gff > output.gff

in the test/data directory. After that add the same command above into the master script:

followed by:

make build_tests

The latter command will automatically generate a Python test for each line in the master script.

The automatically generated test will verify that the command is operational and that the output matches the expectations.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for bio, version 0.1.4
Filename, size File type Python version Upload date Hashes
Filename, size bio-0.1.4-py3-none-any.whl (44.1 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size bio-0.1.4.tar.gz (35.5 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page