Cloud utilities for running Hail systematically.
Project description
sparkhub
Overview
sparkhub is a Python package that provides a set of utilities for running spark pipelines on Google Cloud Platform (GCP) and in an on-prem cluster. It includes functions for generating Hail headers, running Hail pipelines on Dataproc clusters, and managing GCP resources.
Main Features
- Generate Hail headers for use in Hail pipelines
- Run Hail pipelines on Dataproc clusters
- Manage GCP resources, such as Dataproc clusters and Google Cloud Storage buckets
Installation
To install sparkhub, you can use pip:
pip install sparkhub
vscode settings
Before running sparkhub from vscode, you must change your user settings. Make sure the Jupyter > Interactive Window > Text Editor: Magic Commands As Comments option is checked. This will allow you to use magic commands in the interactive window of vscode.
To change your user settings, open the command palette in vscode (using the Ctrl+Shift+P keyboard shortcut) and search for "Preferences: Open User Settings".
Usage
To use sparkhub, you can import the relevant functions into your Python code:
from sparkhub.hailrunner import get_hail_header, HailRunnerGC, RunnerMagics
from sparkhub.submit import *
Then, you can call the functions with the appropriate arguments to generate headers, run pipelines, and manage GCP resources.
Maintainer
sparkhub is maintained by TJ Singh. If you have any questions or issues, please contact him at ts3475@cumc.columbia.edu.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sparkhub-0.3.0.2.tar.gz.
File metadata
- Download URL: sparkhub-0.3.0.2.tar.gz
- Upload date:
- Size: 37.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60f33759e9bc30e3c5d037e9dd234ed78673349b589c515ba4dbe93711e925c6
|
|
| MD5 |
5568067a31351df8684004cab0a5d7ba
|
|
| BLAKE2b-256 |
3bc7edd3ba77a4c12502da2d8f71c7ef89c626950283b0e863a599a776ec8b6e
|
File details
Details for the file sparkhub-0.3.0.2-py3-none-any.whl.
File metadata
- Download URL: sparkhub-0.3.0.2-py3-none-any.whl
- Upload date:
- Size: 20.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e05de757c2d614596c967b6905c7bbbbe4251cd54e3444784eae9a8737f6ed81
|
|
| MD5 |
57e036ff58f944a481cf325417bcbee3
|
|
| BLAKE2b-256 |
08f412f47b8d2410edf84a4ea2e1dce52fe643fe9f8abeaaea77d456b9d61836
|