Contains classes and helpers to build a workflow, and provide options to convert to CWL / WDL
Project description
Janis
Janis is a framework creating specialised, simple workflow definitions that are then transpiled to
Common Workflow Language or Workflow Definition Language.
Documentation is hosted here: https://janis.readthedocs.io/
Introduction
WARNING: this project is work-in-progress and is provided as-is without warranty of any kind. There may be breaking changes committed to this repository without notice.
Janis gives you an API to build computational workflows and will generate a workflow description in CWL and WDL. By using Janis, you get type-safety, portability and reproducibility across all of your execution environments.
Janis requires a Python installation > 3.6, and can be installed through PIP (project page):
# Install janis and the bioinformatics tools
pip3 install janis-pipelines[bioinformatics]
You can import Janis into your project with:
import janis as j
Usage
Janis has an API that mirrors the workflow concepts:
-
j.Workflow
: A workflow represents theEdge
s betweenInput
,Step
,Output
j.Input
: An input to a Workflow, has an identifier, a type and a value.j.Step
: A step also has an identifier and aTool
(CommandTool
or a nestedWorkflow
).j.Output
: An output to a workflow has an identifier and is connected to a step.
-
j.CommandTool
: A command line style tool that builds it's command through the inputs and arguments.j.ToolInput
: An input to a tool, has an identifier, a type and command line options likeposition
,prefix
j.ToolArgument
: An argument to a tool that cannot be overridden. Has a value and command line options likeposition
andprefix
. The value can be a derived type, like anInputSelector
orStringFormatter
.j.ToolOutput
: Output to a tool, has an identifier, a type and a glob.
Example
Further information: Simple Workflow
Below we've constructed a simple example that takes a string input, calls the echo tool and exposes the Echo tool's output as a workflow output.
import janis as j
from janis.unix.tools.echo import Echo
w = j.Workflow("workflowId")
inp = j.Input("inputIdentifier", j.String(), value="my value to print")
echostep = j.Step("stepIdentifier", Echo())
outp = j.Output("outputIdentifier")
w.add_edges([
(inp, echostep.inp), # Connect 'inp' to 'echostep'
(echostep, outp.outp) # Connect output of 'echostep' to 'out'
])
# Will print the CWL, input file and relevant tools to the console
w.translate("cwl") # or "wdl"
We can export a CWL representation to the console using .translate("cwl")
.
Named inputs and Outputs
Every input and output of a tool is named. In this example, Janis knows that there is only one
input and one output of echostep
, so can automatically connect these together. You should see
a statement in the console that indicates that Janis has automatically made this connection.
[INFO]: The node 'stepIdentifier' was not a fully qualified input of the tool 'Echo', this was automatically corrected (stepIdentifier → stepIdentifier.inp)
[INFO]: The node 'outputIdentifier' under-referenced an output the step 'stepIdentifier' (tool: 'Echo'), this was automatically corrected (stepIdentifier → stepIdentifier.outp)
Included tool definitions and types
Bioinformatics
The Janis framework can be extended to include a suite of
Bioinformatics data types and tools.
These can be installed with the bioinformatics
install extra option.
pip3 install janis-pipelines[bioinformatics]
Unix
Tool document:
Some basic unix tools have been wrapped and included as part of the base Janis module and
are the basis for the examples. You can reference these unix tools through
janis.unix.tools
.
These can be referenced by janis.bioinformatics
or janis_bioinformatics
, the latter might be easier due to the way nested python imports work.
More examples
-
Bioinformatics workflow tutorial: AlignSortedBam
-
Unix Toolset: in
janis/examples
. -
Whole genome germline pipeline: janis-examplepipelines repository.
About
Further information: About
This project was produced as part of the Portable Pipelines Project in partnership with:
- Melbourne Bioinformatics (University of Melbourne)
- Peter MacCallum Cancer Centre
- Walter and Eliza Hall Institute of Medical Research (WEHI)
References:
Through conference or talks, this project has been referenced by the following titles:
- Walter and Eliza Hall Institute Talk (WEHI) 2019: Portable Pipelines Project: Developing reproducible bioinformatics pipelines with standardised workflow languages
- Bioinformatics Open Source Conference (BOSC) 2019: Janis: an open source tool to machine generate type-safe CWL and WDL workflows
- Victorian Cancer Bioinformatics Symposium (VCBS) 2019: Developing portable variant calling pipelines with Janis
Support
Contributions
Further information: Development
This project is work-in-progress and is still in developments. Although we welcome contributions, due to the immature state of this project we recommend raising issues through the Github issues page for Pipeline related issues.
If you find an issue with the tool definitions, please see the relevant issue page:
Information about the project structure and more on contributing can be found within the documentation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for janis_pipelines-0.3.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e7878658bb31c3b6ca90efe91bc8daf7aa43547bee125495b85c4cf6b4244667 |
|
MD5 | 3827e5ea10c0eb1fbbfddd807a47eee3 |
|
BLAKE2b-256 | 11423ff34e4034eb28e535693327a468f8b8ef8cf7ce7777363bcc1889fc852a |