Skip to main content

Tools for batching jobs and dealing with file paths

Project description

Architect jobs for running analyses
===================================

.. image:: https://badge.fury.io/py/jobarchitect.svg
:target: http://badge.fury.io/py/jobarchitect
:alt: PyPi package

.. image:: https://readthedocs.org/projects/jobarchitect/badge/?version=latest
:target: http://jobarchitect.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status

- Documentation: http://jobarchitect.readthedocs.io
- GitHub: https://github.com/JIC-CSB/jobarchitect
- PyPI: https://pypi.python.org/pypi/jobarchitect
- Free software: MIT License


Overview
--------

This tool is intended to automate generation of scripts to run analysis on data
sets. To use it, you will need a data set that has been created (or annotated)
with `dtool <https://github.com/JIC-CSB/dtool>`_.
It aims to help by:

1. Removing the need to know where specific data items are stored in a data set
2. Providing a means to split an analyses into several chunks (file based
parallelization)
3. Providing a framework for seamlessly running an analyses inside a container


Design
------

This project has two main components. The first is a command line tool named
``sketchjob`` intended to be used by the end user. It is used to generate
scripts defining jobs to be run. The second (``_analyse_by_ids``) is a command
line tool that is used by the scripts generated by ``sketchjob``. The end user
is not meant to make use of this second script directly.


Installation
------------

To install the jobarchitect package.

::

$ cd jobarchitect
$ python setup.py install


Use
---

To generate bash scripts for data analysis, first create a common workflow task
description file. For example::



Then an example dataset::

$ datatool new dataset
project_name [project_name]:
dataset_name [dataset_name]: example_dataset
...

$ echo "My example data" > example_dataset/data/my_file.txt
$ datatool manifest update example_dataset/

Create an output directory::

$ mkdir output

Then you can generate analysis run scripts with::

sketchjob shasum.cwl exmaple_dataset output/
#!/bin/bash

_analyse_by_ids \
--cwl_tool_wrapper_path=shasum.cwl \
--input_dataset_path=example_dataset/ \
--output_root=output/ \
290d3f1a902c452ce1c184ed793b1d6b83b59164

Try the script with::

$ sketchjob shasum.cwl exmaple_dataset output/ > run.sh
$ bash run.sh
$ cat output/first_image.png
290d3f1a902c452ce1c184ed793b1d6b83b59164 /private/var/folders/hn/crprzwh12kj95plc9jjtxmq82nl2v3/T/tmp_pTfc6/stg02d730c7-17a2-4d06-a017-e59e14cb8885/first_image.png

Working with Docker
-------------------

Building a Docker image
^^^^^^^^^^^^^^^^^^^^^^^

For the tests to pass, you will need to build an example Docker image, which
you do with the provided script::

$ bash build_docker_image.sh

Running code with the Docker backend
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

By inspecting the script and associcated Docker file, you can get an idea of
how to build Docker images that can be used with the jobarchitect Docker
backend, e.g::

$ sketchjob sha1sum.cwl ~/junk/cotyledon_images ~/junk/output --backend=docker --image-name=jicscicomp/jobarchitect
#!/bin/bash

IMAGE_NAME=jicscicomp/jobarchitect
docker run \
--rm \
-v /Users/olssont/junk/cotyledon_images:/input_dataset:ro \
-v /Users/olssont/junk/output:/output \
-v /Users/olssont/sandbox/cwl_v1/sha1sum.cwl:/tool.cwl:ro \
$IMAGE_NAME \
_analyse_by_ids \
--cwl_tool_wrapper_path=/tool.cwl \
--input_dataset_path=/input_dataset \
--output_root=/output \
290d3f1a902c452ce1c184ed793b1d6b83b59164 09648d19e11f0b20e5473594fc278afbede3c9a4

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jobarchitect-0.5.0.tar.gz (6.8 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page