No project description provided

These details have not been verified by PyPI

Project description

🧬 Bioinformatics on Flyte

This repo contains tasks, workflows, image definitions, and datatypes used to standardize the orchestration of common bioinformatics tasks using Flyte.

🐳 Container Images

✨Adding custom dependencies alongside Flytekit

ImageSpecs contained in the images module build a standard set of OCI-compliant container images for use throughout the different workflows. They can be built with entrypoints present in pyproject.toml.

📈 Datatypes

✨Leverage dataclasses to keep things organized

Using dataclasses to define your samples provides a clean and extensible data structure to keep your workflows tidy. Instead of writing to directories and keeping track of things manually on the commandline, these dataclasses will capture relevant metadata about your samples and let you know where to find them in object storage.

🔍 Quality Control and Pre-processing

FastQC

✨Run arbitrary shell commands

FastQC is a very common tool written in Java for gathering QC metrics about raw reads. It doesn't have any python bindings, but luckily Flyte lets us run arbitrary ShellTasks with a clean way of passing in inputs and receiving outputs. Just define a script for what you need to do and ShellTask will handle the rest.

Automatic QC checkpointing

✨Decide wether to continue workflow execution based on QC metrics via conditionals

FastQC generates a summary file with a simple PASS / WARN / FAIL call across a number of different metrics. We can use conditionals in our workflow to check for any FAIL lines in the summary and automatically halt execution. This can surface an early failure without wasting valuable compute or anyone's time doing manual review.

FastP

✨Specify resources and parallelize via map task

FastP is another common pre-processing tool for filtering out bad reads, trimming, and adapter removal. It can be a bit more memory hungry than Flyte's defaults are set to; luckily we can use Resources to bump that up and allow it to run efficiently. Additionally, we can make use of a map task in our workflow to parallelize the processing of fastp across all our samples.

👩‍🔬 Human-in-the-Loop Approval

✨Pause processing while waiting for human input

As a final check before moving onto the alignment, we can define an explicit approval right in the workflow. Aggregating reports of all processing done up to this point, and visualizing it via Decks (more on that later), a researcher is able to quickly get a high level view of the work done so far and approve the analysis for further processing.

📏 Alignment

Generate indices

✨Leverage caching to save time on successive runs

Index generation can be a very compute intensive step. Luckily, we can take advantage of Flyte's native caching when building that index for bowtie and hisat. We've also defined a cache_version in the config that relies on a hash of the reference location in the object store. This means that changing the reference will invalidate the cache and trigger a rebuild, while allowing you to go back to your old reference with impunity.

Bowtie2 vs Hisat2

✨Compare aligners across an arbitrary number of inputs via dynamic workflows

When prototyping a new pipeline, it's usually a good idea to evaluate a few different tools to see how they perform with respect to runtime and resource requirements. This is easy with a dynamic workflow, which allows us to pass in an arbitrary number of inputs to be used with whatever tasks we want. In the main workflow you'll pass a list of filtered samples to each tool and be able to capture run statistics in the Alignment dataclass as well as visualize their runtimes in the Flyte console.

📋 Reporting

✨Visualize performance via Decks

We use MultiQC, an excellent multi-modal visualization tool for reporting. After gathering all relative metrics from a workflow, we're able to render that report via Decks, giving us rich run statistics without ever leaving the Flyte console!

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.1

Sep 9, 2024

This version

0.1.0

Sep 9, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unionbio-0.1.0.tar.gz (28.0 kB view details)

Uploaded Sep 9, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

unionbio-0.1.0-py3-none-any.whl (40.9 kB view details)

Uploaded Sep 9, 2024 Python 3

File details

Details for the file unionbio-0.1.0.tar.gz.

File metadata

Download URL: unionbio-0.1.0.tar.gz
Upload date: Sep 9, 2024
Size: 28.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.11.9 Darwin/23.5.0

File hashes

Hashes for unionbio-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`acdc8577e903cd5bc023b21ca5447cf804cf6fe15ce8d7a41051d842c149e96c`
MD5	`591669cad6094e294004f5bb2f8e0710`
BLAKE2b-256	`021e431a9411632a9e317a40b1257ff9a4628fcce0fed4a316f101883a2fb17a`

See more details on using hashes here.

File details

Details for the file unionbio-0.1.0-py3-none-any.whl.

File metadata

Download URL: unionbio-0.1.0-py3-none-any.whl
Upload date: Sep 9, 2024
Size: 40.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.11.9 Darwin/23.5.0

File hashes

Hashes for unionbio-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7c9d00b6aaee82e48c6a04afbab74d34e4f4fbd06cf7e986f739f8bb658d0b92`
MD5	`8376ea3a22e454409aec8fe59176b342`
BLAKE2b-256	`9a33a2430e26cd3fa8318ef5f7a1846419f442cf2bbfb8a3f71d978974f155fa`

See more details on using hashes here.

unionbio 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

🧬 Bioinformatics on Flyte

🐳 Container Images

📈 Datatypes

🔍 Quality Control and Pre-processing

FastQC

Automatic QC checkpointing

FastP

👩‍🔬 Human-in-the-Loop Approval

📏 Alignment

Generate indices

Bowtie2 vs Hisat2

📋 Reporting

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes