Python library for building ETL pipelines involving Synapse and data processing workflows
Project description
Sage Prefect Tasks
⚠️ Warning: This repository is a work in progress. ⚠️
Python package of useful Prefect tasks for common use cases at Sage Bionetworks.
Some thoughts are included below the Demo Flow and Usage.
Inspired by Pocket/data-flows.
Demo Flow
Demo Usage
Getting access
To run this demo, you'll need the following access:
- You need to ask Bruno for edit-access on the INCLUDE Sandbox Synapse project.
- You need to ask Bruno for edit-access on the include-sandbox Cavatica project.
Getting set up
# Create a virtual environment with the Python dependencies
pipenv install
# Copy the example `.env` file and update the auth tokens
cp .env.example .env
Run the flow at the command line
You'll need to get set up first.
# Run the demo (pipenv will automatically load the `.env` file)
pipenv run python demo.py
Inspect the flow using the Prefect Server UI
You'll need to get set up first.
# Deploy Prefect Server (Orion)
prefect orion start
# Explore the flow runs in Prefect Server
# Usually hosted at http://127.0.0.1:4200/
# Stop the running server with Ctrl-C
Thoughts
-
The
CavaticaBaseTaskdemonstrates a use case for classes (i.e. extendingTask) as opposed to functions (i.e. decorated by@task). On the other hand,SynapseBaseTaskdoesn't really benefit from the class structure. -
The SevenBridges Python client embeds the client instance into every resource object, which prevents
cloudpickleto serialize these objects due toTypeError: cannot pickle '_thread.lock' object.import os import cloudpickle import sevenbridges as sbg api = sbg.Api( url="https://cavatica-api.sbgenomics.com/v2", token=os.environ["SB_AUTH_TOKEN"] ) proj = api.projects.query(name="include-sandbox")[0] proj._API = None proj._api = None proj._data.api = None pickle = cloudpickle.dumps(proj)
Note
This project has been set up using PyScaffold 4.3. For details and usage information on PyScaffold see https://pyscaffold.org/.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sagetasks-0.4.0.tar.gz.
File metadata
- Download URL: sagetasks-0.4.0.tar.gz
- Upload date:
- Size: 331.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
06af12ea24068bde4f2567bd6c1444cf7317dfca2d05d0c703d6745ae039440f
|
|
| MD5 |
a2352161570ee88b0faeb6384369492b
|
|
| BLAKE2b-256 |
0e94e0140ceaaf9ccd61d0138a4b646a9ad6cd4a3efffedd27e1066eb325cf92
|
File details
Details for the file sagetasks-0.4.0-py3-none-any.whl.
File metadata
- Download URL: sagetasks-0.4.0-py3-none-any.whl
- Upload date:
- Size: 24.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6bf0f9fa7aa158160c43727c4c1f177331d5ce5cfd9081c97bf2b979e8cf83c7
|
|
| MD5 |
4d36f5b8bba6a643fd813101c4bdf019
|
|
| BLAKE2b-256 |
cc3fcdbd66267433dc8be5b6dbc4c032742df9358eac4fd0941e34cc774630f8
|