Skip to main content

Analysis of big biological data sets for distributed HPC clusters.

Project description

A commandline tool for analysis of big biological data sets for distributed HPC clusters.

About

PyBDA is a Python library and command line tool for big data analytics and machine learning scaling to tera byte sized data sets.

In order to make PyBDA scale to big data sets, we use Apache Spark’s DataFrame API which, if developed against, automatically distributes data to the nodes of a high-performance cluster and does the computation of expensive machine learning tasks in parallel. For scheduling, PyBDA uses Snakemake to automatically execute pipelines of jobs. In particular, PyBDA will first build a DAG of methods/jobs you want to execute in succession (e.g. dimensionality reduction into clustering) and then compute every method by traversing the DAG. In the case of a successful computation of a job, PyBDA will write results and plots, and create statistics. If one of the jobs fails PyBDA will report where and which method failed (owing to Snakemake’s scheduling) such that the same pipeline can effortlessly be continued from where it failed the last time.

Documentation

Check out the documentation here. The documentation will walk you though

  • the installation process,

  • setting up Apache Spark,

  • using pybda.

Author

Simon Dirmeier simon.dirmeier at bsse.ethz.ch.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pybda-0.1.0.tar.gz (55.0 kB view details)

Uploaded Source

Built Distribution

pybda-0.1.0-py2.py3-none-any.whl (127.2 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file pybda-0.1.0.tar.gz.

File metadata

  • Download URL: pybda-0.1.0.tar.gz
  • Upload date:
  • Size: 55.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7

File hashes

Hashes for pybda-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5549d969b88f31201f4e03e8cb5864f0f8c80442369b23241e72973956f09880
MD5 df5d0d81ba850b0d962e6e19c9a89113
BLAKE2b-256 80189ca71b566948e42e938a0d5ddf5d5cefc27c262ee7e94f161794db2d1eaa

See more details on using hashes here.

File details

Details for the file pybda-0.1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: pybda-0.1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 127.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7

File hashes

Hashes for pybda-0.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 de58bf9bbdbf452af77079a37385de7410a4b66f077c4df5f36e0190907419b9
MD5 b830f5e3b7968626d772429bd9c0b331
BLAKE2b-256 9b28c1d9647212d0cc0cc5272077d5456a6d95b52c8b159c18b5c633902cc285

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page