Skip to main content

A wrapper for gwf for easy generation of output file paths

Project description

gwf target group

This Python package provides a convenient way for automatically generating systematic output filenames for your gwf jobs. This will make defining your gwf jobs a good deal terser. Compare this:

from gwf import Workflow

gwf = Workflow()

foo_file = 'first_step/foo.csv' 
bar_file = 'first_step/bar.csv' 
plot_file = 'second_step/plot.png'
summary_file = 'second_step/summary.txt'

gwf.target(
    'target_group.first_step',
    inputs = [],
    outputs = [ foo_file, bar_file ],
) << f"first_step_command -f {foo_file} > {bar_file}"

gwf.target(
    'target_group.second_step',
    inputs = [ foo_file, bar_file ],
    outputs = [ plot_file, summary_file ]
) << f"second_step_command -f {foo_file} -b {bar_file} -p {plot_file} > {summary_file}"

to this:

from gwf import Workflow
from gwf_target_group import TargetGroup

gwf = Workflow()

target_group = TargetGroup( gwf, 'target_group', 'output_prefix/' )

target_group(
    'first_step',
    "first_step_command -f {foo.csv} > {bar.csv}"
) # No input files here. Only 2 output files

target_group(
    'second_step',
    "run_command -f {foo_file} -b {bar_file} -p {plot.png} > {summary.txt}",
    foo_file = target_group.first_step.foo,
    bar_file = target_group.first_step.bar
) # Two input files, two output files

With this package you never specify the path to the output files. Only to the input files. And you can easily refer to the output files by using the automatic attributes of the target group: target_group.first_step.foo.

Installation

Install via pip:

pip install gwf_target_group

(or alternatively copy the __init__.py from this repository and save if as gwf_target_group.py at a convenient location)

Advanced usage

Passing gwf options

If you need to fine-tune the options for a gwf job, you can use the gwf_options parameter:

target_group(
    'my_special_processing_step',
    'do_special_things {data} > {result.tsv}',
    gwf_options = { # gwf_options is a reserved keyword
        'memory': '64g',
        'walltime': 'unlimited'
    },
    data = 'path/to/data.tsv'
)

This is roughly equivalent to the following gwf-only code:

gwf.target(
    'target_group.my_special_processing_step',
    inputs = [ 'path/to/data.tsv' ],
    outputs = [ 'path/to/result.tsv' ],
    options = {
        'memory': '64g',
        'walltime': 'unlimited'
    }
) << 'do_special_things path/to/data.tsv > path/to/result.tsv'

running workflows with different datasets

Sometimes you want to do the same thing with different datasets. For example, you might have a human and a mouse dataset that you want to analyse. Then you can do the following:

def define_analysis( target_group ):
    target_group(
        'sort_genes_by_length',
        'gene_sorter --by length {genome_file} > {list}',
        genome_file = target_group.genome_file # this value was attached previously
    )
    target_group(
        'split_into_test_and_training_datasets',
        'split_list -1 list1 -2 list2 {sorted_genes}',
        sorted_genes = target_group.sort_genes_by_length.list
    )
    # more steps can be added here

human = TargetGroup( gwf, 'human', 'human_results/' )
mouse = TargetGroup( gwf, 'mouse', 'mouse_results/' )

# explicitly attach the path to the genome files to the TargetGroups
human.genome_file = 'data/genomes/human.fa'
mouse.genome_file = 'data/genomes/mouse.fa'

# and then define the analysis for both datasets
define_analysis( human )
define_analysis( mouse )
        '

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gwf_target_group-1.0.1.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gwf_target_group-1.0.1-py3-none-any.whl (18.5 kB view details)

Uploaded Python 3

File details

Details for the file gwf_target_group-1.0.1.tar.gz.

File metadata

  • Download URL: gwf_target_group-1.0.1.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2.post20191203 requests-toolbelt/0.9.1 tqdm/4.40.0 CPython/3.7.3

File hashes

Hashes for gwf_target_group-1.0.1.tar.gz
Algorithm Hash digest
SHA256 72c8fffb4ac23edce55d7fca54012e7db34825095b99c640b29acbf48b133470
MD5 3ebe5236e8da87ed3ca9e0f88180b84c
BLAKE2b-256 38d0b27fba3d7ca354d60275a1ff83b366d8dfd0988c4efad5d7f8779051a0ba

See more details on using hashes here.

File details

Details for the file gwf_target_group-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: gwf_target_group-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 18.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2.post20191203 requests-toolbelt/0.9.1 tqdm/4.40.0 CPython/3.7.3

File hashes

Hashes for gwf_target_group-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c9d1c355b06291be8b83e5f8be35aefcd727d52cfa7eca38377516ab36891067
MD5 48bb652387cba537618c615cee011177
BLAKE2b-256 b54127d82171e1e1d36ff4b134affda0c91f2fea3f7509e95898f9698766280e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page