Analysis of big biological data sets for distributed HPC clusters.
Project description
A commandline tool for analysis of big biological data sets for distributed HPC clusters.
About
PyBDA is a Python library and command line tool for big data analytics and machine learning scaling to tera byte sized data sets.
In order to make PyBDA scale to big data sets, we use Apache Spark’s DataFrame API which, if developed against, automatically distributes data to the nodes of a high-performance cluster and does the computation of expensive machine learning tasks in parallel. For scheduling, PyBDA uses Snakemake to automatically execute pipelines of jobs. In particular, PyBDA will first build a DAG of methods/jobs you want to execute in succession (e.g. dimensionality reduction into clustering) and then compute every method by traversing the DAG. In the case of a successful computation of a job, PyBDA will write results and plots, and create statistics. If one of the jobs fails PyBDA will report where and which method failed (owing to Snakemake’s scheduling) such that the same pipeline can effortlessly be continued from where it failed the last time.
Documentation
Check out the documentation here. The documentation will walk you though
the installation process,
setting up Apache Spark,
using pybda.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pybda-0.1.0.tar.gz
.
File metadata
- Download URL: pybda-0.1.0.tar.gz
- Upload date:
- Size: 55.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5549d969b88f31201f4e03e8cb5864f0f8c80442369b23241e72973956f09880 |
|
MD5 | df5d0d81ba850b0d962e6e19c9a89113 |
|
BLAKE2b-256 | 80189ca71b566948e42e938a0d5ddf5d5cefc27c262ee7e94f161794db2d1eaa |
File details
Details for the file pybda-0.1.0-py2.py3-none-any.whl
.
File metadata
- Download URL: pybda-0.1.0-py2.py3-none-any.whl
- Upload date:
- Size: 127.2 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | de58bf9bbdbf452af77079a37385de7410a4b66f077c4df5f36e0190907419b9 |
|
MD5 | b830f5e3b7968626d772429bd9c0b331 |
|
BLAKE2b-256 | 9b28c1d9647212d0cc0cc5272077d5456a6d95b52c8b159c18b5c633902cc285 |