Skip to main content

A tool that recommends the best way to run your genomics pipelines on the cloud

Project description

Hummingbird: Efficient Performance Prediction for Executing Genomic Applications in the Cloud

Overview

Hummingbird is a Python framework that gives a variety of optimum instance configurations to run your favorite genomics pipeline on cloud platforms.

The input for this framework is the necessary information required to run a cloud job and it generates different instance configurations that the user can use to run the pipeline on the cloud. The user can choose from a variety of instance configurations, such as the fastest, the cheapest, and the most efficient. The detailed explanation on these configurations can be found in the latter section of this README.

The unique feature about Hummingbird is that it takes the input files, downsamples them, runs the whole computational pipeline on these downwsampled files and subsequently provides the user with different optimum instance configurations. Therefore, the users obtain the resulting configurations in a short amount of time compared to a run on the entire pipeline with the whole input file(s) for different instance configurations.

Currently, Hummingbird supports Google Cloud (GCP), Amazon Web Service (AWS) and Microsoft Azure, and we hope to add other cloud providers in the future.

Installation Instructions

Hummingbird can be installed using

pip install CloudHummingbird

It is recommended to use the --install-option="--prefix=$PREFIX_PATH" along with pip while installing Hummingbird. This would give users easy access to the sample configuration files located in conf/examples which the users might need to refer to while writing their own configuration file(s) for their own computational pipeline. Alternatively, the configuration files can be found here: <virtualenv_name>/lib/<python_ver>/site-packages/Hummingbird/conf/examples

Hummingbird requires pip and python 3 as prerequesites for installation.

It is highly recommended to use a virtual environment to isolate the execution environment. Please follow the instructions from the above link to create a virtual environment, and then activate it:

source <virtual-environment-name>/bin/activate

Section 1: Getting Started on Google Cloud, AWS Batch, Azure Batch

This section explains how to get started on Google Cloud, AWS and Azure.

Section 2: Sample Run on Google Cloud

This section provides instructions to execute a sample run of BWA on Google Cloud using Hummingbird

Section 3: Editing the Configuration File

This section provides information about the configuration file and how to edit it

Section 4: Executing Hummingbird

This section provides information about how to execute Hummingbird

Section 5: Hummingbird Result

This section provides a guide to interpret the results provided by Hummingbird

Section 6: Using Different Input File Formats and Tools for Format Conversions

This section provides a guide for users who want to leverage the downsampling step in Hummingbird but have input files in formats different than BAM or fastq/fastq.gz

Section 7: Alternative Downsampling Methods

This section provides users a guide to alternative downsampling techniques other than the ones supported by Hummingbird

Section 8: Workflow Parser

This section explains how Hummingbird parses workflows provided by the user

Section 9: Container Technology

This section explains how Hummingbird takes advantage of the container technology for execution

Section 10: I/O Profiling

This section explains how future versions of Hummingbird will profile I/O throughput as well

Section 11: Fault Tolerance

This section describes the fault tolerant capabilities of Hummingbird

Section 12: Requirements for Running Hummingbird on Cloud Platform

This section lists all required components for running Hummingbird on a Cloud Platform provider.

  • Logo Credit: Camille Berry

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

CloudHummingbird-1.1.0.tar.gz (46.8 kB view details)

Uploaded Source

Built Distribution

CloudHummingbird-1.1.0-py3-none-any.whl (64.4 kB view details)

Uploaded Python 3

File details

Details for the file CloudHummingbird-1.1.0.tar.gz.

File metadata

  • Download URL: CloudHummingbird-1.1.0.tar.gz
  • Upload date:
  • Size: 46.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for CloudHummingbird-1.1.0.tar.gz
Algorithm Hash digest
SHA256 ee58730de897469f7f3ee42a39e6fa36e43a406d90fcaeaf547d8b7b88ad254d
MD5 40bf6229a28c44ca42679f16046dfa7b
BLAKE2b-256 1180e34b38b1e64af57417bf1dddfe839bfa929f0986c1830e036836e0ee00a2

See more details on using hashes here.

File details

Details for the file CloudHummingbird-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: CloudHummingbird-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 64.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for CloudHummingbird-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4ad20e69a7d6d4f4845385bca0b5273c0893cf7e78e593b7be6c5ff4aa9cfc93
MD5 81ee9121d466a43ddc9b7f86994402a9
BLAKE2b-256 785c9c30856691372100b81d74adf9b406202afc4c4fefbe368ed924ebc02e51

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page