Skip to main content

Cloud utilities for running Hail systematically.

Project description

pycloud / sparkhub

Overview

pycloud / sparkhub is a Python package that provides a set of utilities for running spark pipelines on Google Cloud Platform (GCP) and in an on-prem cluster. It includes functions for generating Hail headers, running Hail pipelines on Dataproc clusters, and managing GCP resources.

Main Features

  • Generate Hail headers for use in Hail pipelines
  • Run Hail pipelines on Dataproc clusters
  • Manage GCP resources, such as Dataproc clusters and Google Cloud Storage buckets

Installation

To install pycloud, you can use pip:

pip install sparkhub

vscode settings

Before running sparkhub from vscode, you must change your user settings. Make sure the Jupyter > Interactive Window > Text Editor: Magic Commands As Comments option is checked. This will allow you to use magic commands in the interactive window of vscode.

To change your user settings, open the command palette in vscode (using the Ctrl+Shift+P keyboard shortcut) and search for "Preferences: Open User Settings".

Usage

To use pycloud, you can import the relevant functions into your Python code:

from pycloud.hailrunner import get_hail_header, HailRunnerGC, RunnerMagics
from pycloud.submit import *

Then, you can call the functions with the appropriate arguments to generate headers, run pipelines, and manage GCP resources.

Maintainer

pycloud/sparkhub is maintained by TJ Singh. If you have any questions or issues, please contact him at ts3475@cumc.columbia.edu.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparkhub-0.2.7.tar.gz (36.8 kB view details)

Uploaded Source

Built Distribution

sparkhub-0.2.7-py3-none-any.whl (19.2 kB view details)

Uploaded Python 3

File details

Details for the file sparkhub-0.2.7.tar.gz.

File metadata

  • Download URL: sparkhub-0.2.7.tar.gz
  • Upload date:
  • Size: 36.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for sparkhub-0.2.7.tar.gz
Algorithm Hash digest
SHA256 2766c0a5e700fab111a83d61e0a5ebb3f2fa9df8ae8c462199e3fd5ee18a1720
MD5 395bf4961c8a8c1316c0ca8c1abfbfe9
BLAKE2b-256 0ab2ed8dc105e4f17a46749bcb2b4ea413f614ca54fb690741cdcf8a0c03d13e

See more details on using hashes here.

File details

Details for the file sparkhub-0.2.7-py3-none-any.whl.

File metadata

  • Download URL: sparkhub-0.2.7-py3-none-any.whl
  • Upload date:
  • Size: 19.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for sparkhub-0.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 3403420d49e1987047268d2c7e871c66a7819c7109b3aa8eda0dcaa476f37789
MD5 512cc9f86a3e4358503ef823f97abd36
BLAKE2b-256 e9fe0cf55835c6ff0a5cf7b81b9fb0859b1c7a5b4bbd3782b7a5fdf28fe2999b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page