Skip to main content

Python library for building ETL pipelines involving Synapse and data processing workflows

Project description

Sage Prefect Tasks

⚠️ Warning: This repository is a work in progress. ⚠️

Python package of useful Prefect tasks for common use cases at Sage Bionetworks.

Some thoughts are included below the Demo Flow and Usage.

Inspired by Pocket/data-flows.

Demo Flow

Demo Flow

Demo Usage

Getting access

To run this demo, you'll need the following access:

  • You need to ask Bruno for edit-access on the INCLUDE Sandbox Synapse project.
  • You need to ask Bruno for edit-access on the include-sandbox Cavatica project.

Getting set up

# Create a virtual environment with the Python dependencies
pipenv install

# Copy the example `.env` file and update the auth tokens
cp .env.example .env

Run the flow at the command line

You'll need to get set up first.

# Run the demo (pipenv will automatically load the `.env` file)
pipenv run python demo.py

Inspect the flow using the Prefect Server UI

You'll need to get set up first.

# Deploy Prefect Server (Orion)
prefect orion start

# Explore the flow runs in Prefect Server
# Usually hosted at http://127.0.0.1:4200/

# Stop the running server with Ctrl-C

Thoughts

  • The CavaticaBaseTask demonstrates a use case for classes (i.e. extending Task) as opposed to functions (i.e. decorated by @task). On the other hand, SynapseBaseTask doesn't really benefit from the class structure.

  • The SevenBridges Python client embeds the client instance into every resource object, which prevents cloudpickle to serialize these objects due to TypeError: cannot pickle '_thread.lock' object.

    import os
    import cloudpickle
    import sevenbridges as sbg
    
    api = sbg.Api(
        url="https://cavatica-api.sbgenomics.com/v2", token=os.environ["SB_AUTH_TOKEN"]
    )
    proj = api.projects.query(name="include-sandbox")[0]
    proj._API = None
    proj._api = None
    proj._data.api = None
    pickle = cloudpickle.dumps(proj)
    

Note

This project has been set up using PyScaffold 4.3. For details and usage information on PyScaffold see https://pyscaffold.org/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sagetasks-0.4.0.tar.gz (331.2 kB view details)

Uploaded Source

Built Distribution

sagetasks-0.4.0-py3-none-any.whl (24.3 kB view details)

Uploaded Python 3

File details

Details for the file sagetasks-0.4.0.tar.gz.

File metadata

  • Download URL: sagetasks-0.4.0.tar.gz
  • Upload date:
  • Size: 331.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for sagetasks-0.4.0.tar.gz
Algorithm Hash digest
SHA256 06af12ea24068bde4f2567bd6c1444cf7317dfca2d05d0c703d6745ae039440f
MD5 a2352161570ee88b0faeb6384369492b
BLAKE2b-256 0e94e0140ceaaf9ccd61d0138a4b646a9ad6cd4a3efffedd27e1066eb325cf92

See more details on using hashes here.

File details

Details for the file sagetasks-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: sagetasks-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 24.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for sagetasks-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6bf0f9fa7aa158160c43727c4c1f177331d5ce5cfd9081c97bf2b979e8cf83c7
MD5 4d36f5b8bba6a643fd813101c4bdf019
BLAKE2b-256 cc3fcdbd66267433dc8be5b6dbc4c032742df9358eac4fd0941e34cc774630f8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page