Skip to main content

Utilities for writing concise snakemake workflows

Project description

gardnersnake

Utilities for writing concise snakemake workflows

Table of Contents

  1. Introduction
  2. Requirements
  3. Installation
  4. Class Objects
  5. Command Line Tools

1. Introduction

Snakemake is an incredibly powerful workflow manager that enables computational biologists to produce clear, reproducible, and modular analysis pipelines using a familiar Python-based grammar. Unfortunately, the bioinformatics tools that we'd like to utilize inside of our Snakemake workflows are often a bit less well-behaved. Gardnersnake is a small package built on the python standard library that aims to make handling this wide variety of tools easier and more compact, especially when working on cluster-based systems.

2. Requirements

The gardnersnake package requires Python >= 3.7.0. Additionally, gardnersnake depends on jsonschema 4.4.0. and pyyaml 6.0

3. Installation

Gardnersnake can be installed most conveniently via pip and the Python Package Index (PyPi).

pip install gardnersnake

This repo can also be cloned, built and installed from source using its setup.py file. A separate requirements.txt file is provided for building compatible environments using venv

git clone https://github.com/zwebbs/gardnersnake.git
cd gardnersnake
python -m build
pip install dist/gardnersnake-*-py3*.whl

4. Class Objects

gardnersnake.ConfigurationHelper()

One of the two foundational objects defined in gardnersnake is the ConfigurationHelper Class. At instantiation ConfigurationHelper takes two arguments, first a cfg_dict, representing the workflow configuration; second, a schema_type specifying which JSON schema pattern to validate the workflow configuration against (More about schemas below). Passing None to schema_type aborts validation alltogether but may result in unintened errors downstream when ConfigurationHelper looks for expected attributes.

gardnersnake.DataManager()

The gardnersnake extended configuration

Example: Working with the extended configuration in Snakemake

5. Command Line Tools

check_directory

Many bioinformatics tools produce directories of various structure with large numbers of output files. Rather than require Snakemake to keep track of these outputs as global outputs, the check_directory command-line utility validates output directories against a known set of files, and returns a small file containing a return code (0) if the directory of interest was successfully validated. check_directory throws an error and does not return the return code file if it is unable to validate the contents according to the given requirements.

The options and requirements are specified in the usage message and can be retrieved using the -h or --help flags.

check_directory --help
usage: check_directory [-h] [--strict] [-o OUT] FILES [FILES ...] DIR

validates dynamic directory contents against expectations

positional arguments:
  FILES                 set of filepaths to check against dir contents
  DIR                   filepath of directory to verify

optional arguments:
  -h, --help            show this help message and exit
  --strict              directory should contain only the passed files
  -o OUT, --output OUT  name of return code output filei

Positional Options \

  • FILES [required] a list of whitespace separated files to search for in the passed directory. these file names should be specified without their path extensions (i.e. a file whose full path is /home/user/analysis/myoutputs/output1.txt should be passed as output1.txt if the DIR is indicated to be /home/user/analysis/myoutputs/)
  • DIR [required] is the full path of the directory to verify. ~/ conventions are acceptable but shell variable syntax such as $WORKDIR are not supported. Relative path functionality remains in active development but is not guaranteed to work as of the current version (0.1.0)

Flagged Options \

  • --ouput -o [required] specifies the name of the file generated (containing the return code) when the passed directory is successfully validated.
  • --strict [optional] indicates that the passed directory should only contain the files listed in the FILES positional argument, and no other files or subdirectories. the default setting, nonstrict will validate directories containing extra files so long as the required ones are present. This gives the user the ability to be more or less permissive with their checks. Typical usage may look like:
check_dir -o rc.out --strict output1.txt output2.txt ~/myanalysis/outputs/

which should return a file called rc.out if the folder ~/myanalysis/outputs/ has exactly two files --> output1.txt and output2.txt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gardnersnake-0.2.3.tar.gz (24.4 kB view details)

Uploaded Source

Built Distribution

gardnersnake-0.2.3-py3-none-any.whl (26.7 kB view details)

Uploaded Python 3

File details

Details for the file gardnersnake-0.2.3.tar.gz.

File metadata

  • Download URL: gardnersnake-0.2.3.tar.gz
  • Upload date:
  • Size: 24.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.7.0

File hashes

Hashes for gardnersnake-0.2.3.tar.gz
Algorithm Hash digest
SHA256 ca58929ca858bb51f7b1887594e99c2042e1e1ead8c64e93426bd9bd64a6cc82
MD5 aefacff6014aa3649793e15dd044e78a
BLAKE2b-256 4c792e5a7f56c0ae5354abe6d83b45a886074e2cbd35758912914b2106fbb32f

See more details on using hashes here.

Provenance

File details

Details for the file gardnersnake-0.2.3-py3-none-any.whl.

File metadata

File hashes

Hashes for gardnersnake-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0f796e738e57b893fbc3cba06f7b38edd179e311f2d977a4dc3d7d3d6dfb80f6
MD5 97df30149e6ef31c4eaf25304c9ba20a
BLAKE2b-256 01b13b3fc5fa2d569bd96482f06fd6c94db1e6c3d238f3a767d37405420fd240

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page