Python library for building ETL pipelines involving Synapse and data processing workflows
Project description
Sage Prefect Tasks
⚠️ Warning: This repository is a work in progress. ⚠️
Python package of useful Prefect tasks for common use cases at Sage Bionetworks.
Some thoughts are included below the Demo Flow and Usage.
Inspired by Pocket/data-flows.
Demo Flow
Demo Usage
Getting access
To run this demo, you'll need the following access:
- You need to ask Bruno for edit-access on the INCLUDE Sandbox Synapse project.
- You need to ask Bruno for edit-access on the include-sandbox Cavatica project.
Getting set up
# Create a virtual environment with the Python dependencies
pipenv install
# Copy the example `.env` file and update the auth tokens
cp .env.example .env
Run the flow at the command line
You'll need to get set up first.
# Run the demo (pipenv will automatically load the `.env` file)
pipenv run python demo.py
Inspect the flow using the Prefect Server UI
You'll need to get set up first.
# Deploy Prefect Server in the background
prefect server start --detach
# Create a project in Prefect Server
prefect create project "demo"
# Run the demo in "register" mode
pipenv run python demo.py register
# Explore the flow under the demo project in Prefect Server
# Usually hosted at http://127.0.0.1:8080/
# Stop the running containers
prefect server stop
Thoughts
-
The
CavaticaBaseTask
demonstrates a use case for classes (i.e. extendingTask
) as opposed to functions (i.e. decorated by@task
). On the other hand,SynapseBaseTask
doesn't really benefit from the class structure. -
The SevenBridges Python client embeds the client instance into every resource object, which prevents
cloudpickle
to serialize these objects due toTypeError: cannot pickle '_thread.lock' object
.import os import cloudpickle import sevenbridges as sbg api = sbg.Api( url="https://cavatica-api.sbgenomics.com/v2", token=os.environ["SB_AUTH_TOKEN"] ) proj = api.projects.query(name="include-sandbox")[0] proj._API = None proj._api = None proj._data.api = None pickle = cloudpickle.dumps(proj)
Note
This project has been set up using PyScaffold 4.3. For details and usage information on PyScaffold see https://pyscaffold.org/.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sagetasks-0.1.0a4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1bbfa3e2f62b438130b3c576c374c32d0e304aeb4d3ad5eabc5948702c2fa0a6 |
|
MD5 | fa39f871aa15bb9570d82ee57162954e |
|
BLAKE2b-256 | 9b5f57408282b4f3f610833de16deeb0889dadacf2b60cec271fc32bf1797301 |