Skip to main content

Cloud utilities for running Hail systematically.

Project description

sparkhub

Overview

sparkhub is a Python package that provides a set of utilities for running spark pipelines on Google Cloud Platform (GCP) and in an on-prem cluster. It includes functions for generating Hail headers, running Hail pipelines on Dataproc clusters, and managing GCP resources.

Main Features

  • Generate Hail headers for use in Hail pipelines
  • Run Hail pipelines on Dataproc clusters
  • Manage GCP resources, such as Dataproc clusters and Google Cloud Storage buckets

Installation

To install sparkhub, you can use pip:

pip install sparkhub2

vscode settings

Before running sparkhub from vscode, you must change your user settings. Make sure the Jupyter > Interactive Window > Text Editor: Magic Commands As Comments option is checked. This will allow you to use magic commands in the interactive window of vscode.

To change your user settings, open the command palette in vscode (using the Ctrl+Shift+P keyboard shortcut) and search for "Preferences: Open User Settings".

Usage

To use sparkhub, you can import the relevant functions into your Python code:

from sparkhub.hailrunner import get_hail_header, HailRunnerGC, RunnerMagics
from sparkhub.submit import *

Then, you can call the functions with the appropriate arguments to generate headers, run pipelines, and manage GCP resources.

Maintainer

sparkhub is maintained by TJ Singh. If you have any questions or issues, please contact him at ts3475@cumc.columbia.edu.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparkhub2-0.4.0.tar.gz (37.6 kB view details)

Uploaded Source

Built Distribution

sparkhub2-0.4.0-py3-none-any.whl (20.4 kB view details)

Uploaded Python 3

File details

Details for the file sparkhub2-0.4.0.tar.gz.

File metadata

  • Download URL: sparkhub2-0.4.0.tar.gz
  • Upload date:
  • Size: 37.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.4

File hashes

Hashes for sparkhub2-0.4.0.tar.gz
Algorithm Hash digest
SHA256 bbab7100c28e51faa842144661eda24b8eaeea1ea26d6a66af6f903906198a08
MD5 ff9fdcb3dfcb9397d2ce3febb672a0fd
BLAKE2b-256 31f1e8ef4a2cf6870753cb086c62d99b09e7073884b5453586f6dc29f38b4fd8

See more details on using hashes here.

File details

Details for the file sparkhub2-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: sparkhub2-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 20.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.4

File hashes

Hashes for sparkhub2-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 62b9b96caf797d13940d411b1be423eccf80f782ab2c5ca5d6c5d468733d6e9b
MD5 d86718f02319eee0535a9f5921f7e30e
BLAKE2b-256 60441e7e60fb6d3f16be740ab03500d9dab705f84a60a408034565d79d30b02a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page