Python library for building ETL pipelines involving Synapse and data processing workflows
Project description
Sage Prefect Tasks
⚠️ Warning: This repository is a work in progress. ⚠️
Python package of useful Prefect tasks for common use cases at Sage Bionetworks.
Some thoughts are included below the Demo Flow and Usage.
Inspired by Pocket/data-flows.
Demo Flow
Demo Usage
Getting access
To run this demo, you'll need the following access:
- You need to ask Bruno for edit-access on the INCLUDE Sandbox Synapse project.
- You need to ask Bruno for edit-access on the include-sandbox Cavatica project.
Getting set up
# Create a virtual environment with the Python dependencies
pipenv install
# Copy the example `.env` file and update the auth tokens
cp .env.example .env
Run the flow at the command line
You'll need to get set up first.
# Run the demo (pipenv will automatically load the `.env` file)
pipenv run python demo.py
Inspect the flow using the Prefect Server UI
You'll need to get set up first.
# Deploy Prefect Server in the background
prefect server start --detach
# Create a project in Prefect Server
prefect create project "demo"
# Run the demo in "register" mode
pipenv run python demo.py register
# Explore the flow under the demo project in Prefect Server
# Usually hosted at http://127.0.0.1:8080/
# Stop the running containers
prefect server stop
Thoughts
-
The
CavaticaBaseTask
demonstrates a use case for classes (i.e. extendingTask
) as opposed to functions (i.e. decorated by@task
). On the other hand,SynapseBaseTask
doesn't really benefit from the class structure. -
The SevenBridges Python client embeds the client instance into every resource object, which prevents
cloudpickle
to serialize these objects due toTypeError: cannot pickle '_thread.lock' object
.import os import cloudpickle import sevenbridges as sbg api = sbg.Api( url="https://cavatica-api.sbgenomics.com/v2", token=os.environ["SB_AUTH_TOKEN"] ) proj = api.projects.query(name="include-sandbox")[0] proj._API = None proj._api = None proj._data.api = None pickle = cloudpickle.dumps(proj)
Note
This project has been set up using PyScaffold 4.3. For details and usage information on PyScaffold see https://pyscaffold.org/.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sagetasks-0.1.0a5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8eb11d9a6028587a59a8b8209ac302d2c6ce6cdc1a610104de997ff58f044d9e |
|
MD5 | 79c5dd6992e7afc8ae6b53c74d452ec8 |
|
BLAKE2b-256 | 96c097f2dce5c86c175c3ec5c97239c4bc8bfdca9ebacdbd932855b1c9f0c0d3 |