Project description

gwf target group

This Python package provides a convenient way for automatically generating systematic output filenames for your gwf jobs. This will make defining your gwf jobs a good deal terser. Compare this:

from gwf import Workflow

gwf = Workflow()

foo_file = 'first_step/foo.csv' 
bar_file = 'first_step/bar.csv' 
plot_file = 'second_step/plot.png'
summary_file = 'second_step/summary.txt'

gwf.target(
    'target_group.first_step',
    inputs = [],
    outputs = [ foo_file, bar_file ],
) << f"first_step_command -f {foo_file} > {bar_file}"

gwf.target(
    'target_group.second_step',
    inputs = [ foo_file, bar_file ],
    outputs = [ plot_file, summary_file ]
) << f"second_step_command -f {foo_file} -b {bar_file} -p {plot_file} > {summary_file}"

to this:

from gwf import Workflow
from gwf_target_group import TargetGroup

gwf = Workflow()

target_group = TargetGroup( gwf, 'target_group', 'output_prefix/' )

target_group(
    'first_step',
    "first_step_command -f {foo.csv} > {bar.csv}"
) # No input files here. Only 2 output files

target_group(
    'second_step',
    "run_command -f {foo_file} -b {bar_file} -p {plot.png} > {summary.txt}",
    foo_file = target_group.first_step.foo,
    bar_file = target_group.first_step.bar
) # Two input files, two output files

With this package you never specify the path to the output files. Only to the input files. And you can easily refer to the output files by using the automatic attributes of the target group: target_group.first_step.foo.

Installation

Install via pip:

pip install gwf_target_group

(or alternatively copy the __init__.py from this repository and save if as gwf_target_group.py at a convenient location)

Advanced usage

Passing gwf options

If you need to fine-tune the options for a gwf job, you can use the gwf_options parameter:

target_group(
    'my_special_processing_step',
    'do_special_things {data} > {result.tsv}',
    gwf_options = { # gwf_options is a reserved keyword
        'memory': '64g',
        'walltime': 'unlimited'
    },
    data = 'path/to/data.tsv'
)

This is roughly equivalent to the following gwf-only code:

gwf.target(
    'target_group.my_special_processing_step',
    inputs = [ 'path/to/data.tsv' ],
    outputs = [ 'path/to/result.tsv' ],
    options = {
        'memory': '64g',
        'walltime': 'unlimited'
    }
) << 'do_special_things path/to/data.tsv > path/to/result.tsv'

running workflows with different datasets

Sometimes you want to do the same thing with different datasets. For example, you might have a human and a mouse dataset that you want to analyse. Then you can do the following:

def define_analysis( target_group ):
    target_group(
        'sort_genes_by_length',
        'gene_sorter --by length {genome_file} > {list}',
        genome_file = target_group.genome_file # this value was attached previously
    )
    target_group(
        'split_into_test_and_training_datasets',
        'split_list -1 list1 -2 list2 {sorted_genes}',
        sorted_genes = target_group.sort_genes_by_length.list
    )
    # more steps can be added here

human = TargetGroup( gwf, 'human', 'human_results/' )
mouse = TargetGroup( gwf, 'mouse', 'mouse_results/' )

# explicitly attach the path to the genome files to the TargetGroups
human.genome_file = 'data/genomes/human.fa'
mouse.genome_file = 'data/genomes/mouse.fa'

# and then define the analysis for both datasets
define_analysis( human )
define_analysis( mouse )
        '

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

1.0.1

Dec 10, 2019

1.0.0

Dec 9, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gwf_target_group-1.0.1.tar.gz (6.1 kB view hashes)

Uploaded Dec 10, 2019 Source

Built Distribution

gwf_target_group-1.0.1-py3-none-any.whl (18.5 kB view hashes)

Uploaded Dec 10, 2019 Python 3

Hashes for gwf_target_group-1.0.1.tar.gz

Hashes for gwf_target_group-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`72c8fffb4ac23edce55d7fca54012e7db34825095b99c640b29acbf48b133470`
MD5	`3ebe5236e8da87ed3ca9e0f88180b84c`
BLAKE2b-256	`38d0b27fba3d7ca354d60275a1ff83b366d8dfd0988c4efad5d7f8779051a0ba`

Hashes for gwf_target_group-1.0.1-py3-none-any.whl

Hashes for gwf_target_group-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c9d1c355b06291be8b83e5f8be35aefcd727d52cfa7eca38377516ab36891067`
MD5	`48bb652387cba537618c615cee011177`
BLAKE2b-256	`b54127d82171e1e1d36ff4b134affda0c91f2fea3f7509e95898f9698766280e`