BanzaiDB

Database for Banzai NGS pipeline tool

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved
Natural Language
- English
Operating System
- POSIX :: Linux
Programming Language
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

(Using landscape.io and drone.io)

News

API changes (v1 -> v2 -> v3 -> v?). The population of a mapping run into BanzaiDB was dependent on a nesoni run. Originally we (API v1) parsed the reports.txt for each strain. In API v2 we parse the nway.any (assumes you have ran nesoni nway). BanzaiDB API v3 assumes that you still have accessto the consensus.fa (called a consensus). We need this data to store information in BanzaiDB about coverage.

What is BanzaiDB?

Please use the releases (https://github.com/mscook/BanzaiDB/releases). All versions including most recent made some significant assumptions that are currently being improved.

BanzaiDB is a tool for pairing Microbial Genomics Next Generation Sequencing (NGS) analysis with a NoSQL database. We use the RethinkDB NoSQL database.

BanzaiDB:

initialises the NoSQL database and associated tables,
populates the database with results of NGS experiments/analysis and,
provides a set of query functions to wrangle with the data stored within the database.

Why BanzaiDB?

The analysis (primary/secondary/tertiary) of large collections of draft microbial genomes from NGS typically generates many separate flat files.

The bioinformatician will:

write scripts to parse and extract the important information from the results files (often trying to standardise the output from similar programs),
store these results in further flat files,
write scripts to link the results of one analysis step to another,
store these results in further flat files,
modify scripts as hypothesis is improved as a direct consequence of incorporating the knowledge from the previous steps,
…
…
…
end up with thousands of flat files, many scripts and generally get confused as to how and where everything came from.

The idea around BanzaiDB is to run once, store once analyse many times.

About BanzaiDB

BanzaiDB is geared towards outputs of Bioinformatics software employed by the Banzai NGS pipeline.

BanzaiDB is thus geared towards handling data generated from:

Velvet and SPAdes (assembly),
BWA and Nesoni (mapping/variant calling),
Mugsy (whole genome alignment),
BRATTNextGen (recombination detection) and,
Prokka (annotation).

The present focus is on storing and manipulating the results of SNP and recombination analysis.

Banzai is not a stable API.

See the ReadTheDocs site for BanzaiDB documentation (User & API).

About RethinkDB

We choose RethinkDB as our underlying database for a few reasons:

RethinkDB is both developer and operations friendly. This sits well with the typical bioinformatician,
NoSQL databases allow for a flexible schema. We can store/collect now, think later. This is much like how science is performed.
Not every bioinformatician or lab has a system administrator. RethinkDB is easy to setup and administer
We don’t know how big our complex our datasets could get in the future. It is easy to scale RethinkDB into a cluster.
ReQL the underlying query language is nice and simple to learn/understand. We’re also very comfortable with Python and the availability of official python drivers (also JavaScript & Ruby, and a heap for user contributed for a swag of languages) is a big bonus.

BanzaiDB Requirements

You will need:

(probably) administrator access to your machine(s)
a RethinkDB server/instance. This can be running locally or on a VPS,
git (to clone this repository) and
pip
bedtools, samtools and tabix (for pybedtools)

You will also need a few Python modules:

rethinkdb
biopython
reportlab
fabric
tablib
argparse (if Python 2.6)
pybedtools (you will probably also need to install cython)

The Python modules should/will be pulled down automatically when installing BanzaiDB.

We recommend you increase the rethinkdb python driver performance. We have found that in some cases the installation of C++ backend fails. We provide a simple protocol that we have found works.

BanzaiDB Installation

Something like this:

$ git clone https://github.com/mscook/BanzaiDB.git
$ cd BanzaiDB
$ python setup.py install

Getting BanzaiDB talking to RethinkDB

You provide information about you RethinkDB instance and database using the file ~/.BanzaiDB.cfg (~/ is shorthand for $HOME).

The configuration file supports:

db_host  =  [def = localhost]
port     =  [def = 28015]
db_name  =  [def = Banzai]
auth_key =  [def = '']

BanzaiDB usage

Note: Please refer to the BanzaiDB documentation (via ReadTheDocs) for more detailed information (under active development).

Once both RethinkDB and BanzaiDB are installed and the configuration is set:

$ python BanzaiDB.py -h
usage: BanzaiDB.py [-h] [-v] {init,populate,update,query} ...

BanzaiDB v 0.3.0 - Database for Banzai NGS pipeline tool
(http://github.com/mscook/BanzaiDB)

positional arguments:
  {init,populate,update,query}
                        Available commands:
    init                Initialise a DB
    populate            Populates a database with results of an experiment
    update              Updates a database with results from a new experiment
    query               List available or provide database query functions

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         verbose output

Licence: ECL 2.0 by Mitchell Stanton-Cook <m.stantoncook@gmail.com>

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved
Natural Language
- English
Operating System
- POSIX :: Linux
Programming Language
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

This version

0.3.0

Aug 11, 2014

0.2.0

Aug 8, 2014

0.1.2

Jul 9, 2014

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

BanzaiDB-0.3.0.tar.gz (158.2 kB view details)

Uploaded Aug 11, 2014 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

BanzaiDB-0.3.0-py2-none-any.whl (177.6 kB view details)

Uploaded Aug 11, 2014 Python 2

File details

Details for the file BanzaiDB-0.3.0.tar.gz.

File metadata

Download URL: BanzaiDB-0.3.0.tar.gz
Upload date: Aug 11, 2014
Size: 158.2 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for BanzaiDB-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`9a4fdfe3a8bd785ce3545dfcbe97ed735b79011e30c6dc6e774db3abb3b8088a`
MD5	`2a11e16864331d769ad1c06dec0a358b`
BLAKE2b-256	`c0812f1603c2f5d8f9a5a50ea2be8c6d99fb014eed3b5d202c98ce1b49f33949`

See more details on using hashes here.

File details

Details for the file BanzaiDB-0.3.0-py2-none-any.whl.

File metadata

Download URL: BanzaiDB-0.3.0-py2-none-any.whl
Upload date: Aug 11, 2014
Size: 177.6 kB
Tags: Python 2
Uploaded using Trusted Publishing? No

File hashes

Hashes for BanzaiDB-0.3.0-py2-none-any.whl
Algorithm	Hash digest
SHA256	`c6f03b2f70c56a0884903df08e639d821f03ddc410fa70cb7fadb84abdd31b19`
MD5	`77443af7ee77d1df81e21e1713656bf4`
BLAKE2b-256	`20434836eed6d19a90bfb87fcd0b25e49aebe34073e2f608e91acc7d53d9f2fb`

See more details on using hashes here.

BanzaiDB 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

News

What is BanzaiDB?

Why BanzaiDB?

About BanzaiDB

About RethinkDB

BanzaiDB Requirements

BanzaiDB Installation

Getting BanzaiDB talking to RethinkDB

BanzaiDB usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes