Tools for batching jobs and dealing with file paths
Project description
Architect jobs for running analyses
===================================
.. image:: https://badge.fury.io/py/jobarchitect.svg
:target: http://badge.fury.io/py/jobarchitect
:alt: PyPi package
.. image:: https://readthedocs.org/projects/jobarchitect/badge/?version=latest
:target: http://jobarchitect.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status
- Documentation: http://jobarchitect.readthedocs.io
- GitHub: https://github.com/JIC-CSB/jobarchitect
- PyPI: https://pypi.python.org/pypi/jobarchitect
- Free software: MIT License
Overview
--------
This tool is intended to automate generation of scripts to run analysis on data
sets. To use it, you will need a data set that has been created (or annotated)
with `dtool <https://github.com/JIC-CSB/dtool>`_.
It aims to help by:
1. Removing the need to know where specific data items are stored in a data set
2. Providing a means to split an analyses into several chunks (file based
parallelization)
3. Providing a framework for seamlessly running an analyses inside a container
Design
------
This project has two main components. The first is a command line tool named
``sketchjob`` intended to be used by the end user. It is used to generate
scripts defining jobs to be run. The second (``_analyse_by_ids``) is a command
line tool that is used by the scripts generated by ``sketchjob``. The end user
is not meant to make use of this second script directly.
Installation
------------
To install the jobarchitect package.
::
$ cd jobarchitect
$ python setup.py install
Use
---
To generate bash scripts for data analysis, first create a common workflow task
description file. For example::
Then an example dataset::
$ datatool new dataset
project_name [project_name]:
dataset_name [dataset_name]: example_dataset
...
$ echo "My example data" > example_dataset/data/my_file.txt
$ datatool manifest update example_dataset/
Create an output directory::
$ mkdir output
Then you can generate analysis run scripts with::
sketchjob shasum.cwl exmaple_dataset output/
#!/bin/bash
_analyse_by_ids \
--cwl_tool_wrapper_path=shasum.cwl \
--input_dataset_path=example_dataset/ \
--output_root=output/ \
290d3f1a902c452ce1c184ed793b1d6b83b59164
Try the script with::
$ sketchjob shasum.cwl exmaple_dataset output/ > run.sh
$ bash run.sh
$ cat output/first_image.png
290d3f1a902c452ce1c184ed793b1d6b83b59164 /private/var/folders/hn/crprzwh12kj95plc9jjtxmq82nl2v3/T/tmp_pTfc6/stg02d730c7-17a2-4d06-a017-e59e14cb8885/first_image.png
Working with Docker
-------------------
Building a Docker image
^^^^^^^^^^^^^^^^^^^^^^^
For the tests to pass, you will need to build an example Docker image, which
you do with the provided script::
$ bash build_docker_image.sh
Running code with the Docker backend
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
By inspecting the script and associcated Docker file, you can get an idea of
how to build Docker images that can be used with the jobarchitect Docker
backend, e.g::
$ sketchjob sha1sum.cwl ~/junk/cotyledon_images ~/junk/output --backend=docker --image-name=jicscicomp/jobarchitect
#!/bin/bash
IMAGE_NAME=jicscicomp/jobarchitect
docker run \
--rm \
-v /Users/olssont/junk/cotyledon_images:/input_dataset:ro \
-v /Users/olssont/junk/output:/output \
-v /Users/olssont/sandbox/cwl_v1/sha1sum.cwl:/tool.cwl:ro \
$IMAGE_NAME \
_analyse_by_ids \
--cwl_tool_wrapper_path=/tool.cwl \
--input_dataset_path=/input_dataset \
--output_root=/output \
290d3f1a902c452ce1c184ed793b1d6b83b59164 09648d19e11f0b20e5473594fc278afbede3c9a4
===================================
.. image:: https://badge.fury.io/py/jobarchitect.svg
:target: http://badge.fury.io/py/jobarchitect
:alt: PyPi package
.. image:: https://readthedocs.org/projects/jobarchitect/badge/?version=latest
:target: http://jobarchitect.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status
- Documentation: http://jobarchitect.readthedocs.io
- GitHub: https://github.com/JIC-CSB/jobarchitect
- PyPI: https://pypi.python.org/pypi/jobarchitect
- Free software: MIT License
Overview
--------
This tool is intended to automate generation of scripts to run analysis on data
sets. To use it, you will need a data set that has been created (or annotated)
with `dtool <https://github.com/JIC-CSB/dtool>`_.
It aims to help by:
1. Removing the need to know where specific data items are stored in a data set
2. Providing a means to split an analyses into several chunks (file based
parallelization)
3. Providing a framework for seamlessly running an analyses inside a container
Design
------
This project has two main components. The first is a command line tool named
``sketchjob`` intended to be used by the end user. It is used to generate
scripts defining jobs to be run. The second (``_analyse_by_ids``) is a command
line tool that is used by the scripts generated by ``sketchjob``. The end user
is not meant to make use of this second script directly.
Installation
------------
To install the jobarchitect package.
::
$ cd jobarchitect
$ python setup.py install
Use
---
To generate bash scripts for data analysis, first create a common workflow task
description file. For example::
Then an example dataset::
$ datatool new dataset
project_name [project_name]:
dataset_name [dataset_name]: example_dataset
...
$ echo "My example data" > example_dataset/data/my_file.txt
$ datatool manifest update example_dataset/
Create an output directory::
$ mkdir output
Then you can generate analysis run scripts with::
sketchjob shasum.cwl exmaple_dataset output/
#!/bin/bash
_analyse_by_ids \
--cwl_tool_wrapper_path=shasum.cwl \
--input_dataset_path=example_dataset/ \
--output_root=output/ \
290d3f1a902c452ce1c184ed793b1d6b83b59164
Try the script with::
$ sketchjob shasum.cwl exmaple_dataset output/ > run.sh
$ bash run.sh
$ cat output/first_image.png
290d3f1a902c452ce1c184ed793b1d6b83b59164 /private/var/folders/hn/crprzwh12kj95plc9jjtxmq82nl2v3/T/tmp_pTfc6/stg02d730c7-17a2-4d06-a017-e59e14cb8885/first_image.png
Working with Docker
-------------------
Building a Docker image
^^^^^^^^^^^^^^^^^^^^^^^
For the tests to pass, you will need to build an example Docker image, which
you do with the provided script::
$ bash build_docker_image.sh
Running code with the Docker backend
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
By inspecting the script and associcated Docker file, you can get an idea of
how to build Docker images that can be used with the jobarchitect Docker
backend, e.g::
$ sketchjob sha1sum.cwl ~/junk/cotyledon_images ~/junk/output --backend=docker --image-name=jicscicomp/jobarchitect
#!/bin/bash
IMAGE_NAME=jicscicomp/jobarchitect
docker run \
--rm \
-v /Users/olssont/junk/cotyledon_images:/input_dataset:ro \
-v /Users/olssont/junk/output:/output \
-v /Users/olssont/sandbox/cwl_v1/sha1sum.cwl:/tool.cwl:ro \
$IMAGE_NAME \
_analyse_by_ids \
--cwl_tool_wrapper_path=/tool.cwl \
--input_dataset_path=/input_dataset \
--output_root=/output \
290d3f1a902c452ce1c184ed793b1d6b83b59164 09648d19e11f0b20e5473594fc278afbede3c9a4
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
jobarchitect-0.4.0.tar.gz
(7.1 kB
view hashes)