Python package for connecting services and building data pipelines

These details have not been verified by PyPI

Project links

Development Status
- 4 - Beta
Framework
- Apache Airflow
- Apache Airflow :: Provider
Programming Language
- Python

Project description

py-orca

Python package for connecting services and building data pipelines

This Python package provides the components to connect various third-party services such as Synapse, Nextflow Tower, and SevenBridges to build data pipelines using a workflow management system like Airflow.

Demonstration Script

This repository includes a demonstration script called demo.py, which showcases how you can use py-orca to launch and monitor your workflows on Nextflow Tower. Specifically, it illustrates how to process an RNA-seq dataset using a series of workflow runs, namely nf-synapse/synstage, nf-core/rnaseq, and nf-synapse/synindex. py-orca can be used with any Python-compatible workflow management system to orchestrate each step (e.g. Airflow, Prefect, Dagster). The demonstration script uses Metaflow because it's easy to run locally and has an intuitive syntax.

The script assumes that the following environment variables are set. Before setting them up, ensure that you have an AWS profile configured for a role that has access to the dev/ops tower workspace you plan to launch your workflows from. You can set these environment variables using whatever method you prefer (e.g. using an .env file, sourcing a shell script, etc). Refer to .env.example for the format of their values as well as examples.

NEXTFLOWTOWER_CONNECTION_URI
SYNAPSE_CONNECTION_URI
AWS_PROFILE (or another source of AWS credentials)

Once your environment variables are set, you can create a virtual environment, install the Python dependencies, and run the demonstration script (after downloading it) as follows. Note that you will need to update the s3_prefix parameter so that it points to an S3 bucket that is accessible to your Tower workspace.

Creating and setting up your py-`orca` virtual environment and executing `demo.py`

Below are the instructions for creating and setting up your virtual environment and executing the demo.py. You can also check the tutorial here. If you would like to set up a developer environment with the relevant dependencies, you can execute the shell script dev_setup in a clone of this repository stored on your machine. You can run it either on your local or on the EC2 instance. Establishing a development environment on an EC2 instance could encounter hurdles. You might need to install Python build dependencies before using pyenv to manage Python versions. You can refer to this doc to resolve the dependency issue. The openssl11-devel is not available on EC2: Linux Docker v1.3.9 so you can install openssl-devel instead. Moreover, you might run into missing GCC error, you can install GCC usng sudo yum install gcc.

# Create and activate a Python virtual environment (tested with Python 3.10)
python3 -m venv venv/
source venv/bin/activate

# Install Python dependencies
python3 -m pip install 'py-orca[all]' 'metaflow' 'pyyaml' 's3fs'

Before running the example below, ensure that the s3_prefix points to an S3 bucket your Nextflow dev or prod tower workspace has access to. In the example below, we will point to the example-dev-project-tower-scratch S3 bucket because we will be launching our workflows within the example-dev-project workspace in tower-dev. In this case, you can use either of the workflows-nextflow-dev profiles to access the S3 bucket.

# Run the script using an example dataset
python3 demo.py run --dataset_id 'syn51514585' --s3_prefix 's3://example-dev-project-tower-scratch/work'

Once your run takes off, you can follow the output logs in your terminal, or stay updated with your workflow progress on the web client. Be sure that your synstage workflow run has a unique name, and is not an iteration of a previous run (i.e. my_test_dataset_synstage_2, my_test_dataset_synstage_3, and so on). This is because the demo.py script does not currently support being able to locate the staged samplesheet file if it has been staged under a run name that is non-unique.

The above dataset ID (syn51514585) refers to the following YAML file, which should be accessible to Sage employees. Similarly, the samplesheet ID below (syn51514475) should also be accessible to Sage employees. However, there is no secure way to make the output folder accessible to Sage employees, so the synindex step will fail if you attempt to run this script using the example dataset ID. This should be sufficient to get a feel for using py-orca, but feel free to create your own dataset YAML file on Synapse with an output folder that you own.

id: my_test_dataset
samplesheet: syn51514475
output_folder: syn51514559

PyScaffold

This project has been set up using PyScaffold 4.3. For details and usage information on PyScaffold see https://pyscaffold.org/.

putup --name orca --markdown --github-actions --pre-commit --license Apache-2.0 py-orca

Project details

These details have not been verified by PyPI

Project links

Development Status
- 4 - Beta
Framework
- Apache Airflow
- Apache Airflow :: Provider
Programming Language
- Python

Release history Release notifications | RSS feed

This version

1.5.3

May 13, 2026

1.5.2

May 8, 2025

1.5.1

Apr 7, 2025

1.5.0

Mar 31, 2025

1.4.0

Oct 31, 2024

1.3.5

Apr 19, 2024

1.3.4

Mar 4, 2024

1.3.3

Oct 26, 2023

1.3.2

Oct 18, 2023

1.3.1

May 31, 2023

1.3.0

May 31, 2023

1.2.0

May 4, 2023

1.1.0

May 1, 2023

1.0.1

Feb 10, 2023

1.0.0

Feb 10, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_orca-1.5.3.tar.gz (72.4 kB view details)

Uploaded May 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

py_orca-1.5.3-py3-none-any.whl (41.3 kB view details)

Uploaded May 13, 2026 Python 3

File details

Details for the file py_orca-1.5.3.tar.gz.

File metadata

Download URL: py_orca-1.5.3.tar.gz
Upload date: May 13, 2026
Size: 72.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for py_orca-1.5.3.tar.gz
Algorithm	Hash digest
SHA256	`7784fd7d540255cec83d219534ea77fbbcefdf8f60f1c20aeca57f0018007b99`
MD5	`ac4e34358437c1d4e79e345f6fb9bfae`
BLAKE2b-256	`b13f66f6bb798451fe592242327a76d2f63ffaf1d473218688c2ad8afd424697`

See more details on using hashes here.

File details

Details for the file py_orca-1.5.3-py3-none-any.whl.

File metadata

Download URL: py_orca-1.5.3-py3-none-any.whl
Upload date: May 13, 2026
Size: 41.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for py_orca-1.5.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a27d14a7cc4462b84374420c1f6d6b1b94d652b2df7be4c42f6833010e6e8076`
MD5	`089bc8bf64c480c08718d425c4ec19c9`
BLAKE2b-256	`d06b94dc0178307fb5dd27cf51d37d8c0c8141e1fa7056e46b79a2e807b3d47c`

See more details on using hashes here.

py-orca 1.5.3

Navigation

Verified details

Owner

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

py-orca

Demonstration Script

Creating and setting up your py-`orca` virtual environment and executing `demo.py`

PyScaffold

Project details

Verified details

Owner

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

py-orca 1.5.3

Navigation

Verified details

Owner

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

py-orca

Demonstration Script

Creating and setting up your py-orca virtual environment and executing demo.py

PyScaffold

Project details

Verified details

Owner

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Creating and setting up your py-`orca` virtual environment and executing `demo.py`