Contains classes and helpers to generate WDL without worrying about the syntax. This is primarily intended for generating WDL from other in-memory representations of a workflow.
Project description
python-wdlgen
Workflow Description Language is way to describe tasks and workflows in a "human readable and writable way". It was initially developed and offered by Broad Institute to be paired with their workflow engine Cromwell, however it has since been made open source with other engines such as Toil and DNAnexus*.
WARNING
This module now only generates developmental WDL, this includes Directories and wrapping all inputs in an input block. To use this generated WDL, you must use a version of Cromwell higher than 37.
This module automatically includes version development
in the Workflow and Task outputs.
The guides below may not reflect the current version of this repository, but will be updated soon.
This syntax is based on the Developmental Workflow Description Language specification.
Motiviation
I needed an easy way to generate some BASIC WDL through some in memory objects, and I was using (a fork of) common-workflow-language/python-cwlgen, I figured I could open this up to see what use it has.
Installation
pip install illusional.wdlgen
General support
This software is provided as-is, without warranty of any kind ... and so on.
It's a pretty dumb wrapper that uses string interpolation to generate the structure. It wouldn't handle automatically escaping illegal characters.
Generally it supports:
-
Types - All types are represented as a
WdlType
, which can either be aPrimitiveType
, or anArrayType
(see goal). Also supports the postfix quantifiers. -
Workflow creation (
wdlgen.Workflow
)- manual imports (
wdlgen.Workflow.WorkflowImport
) - inputs (
wdlgen.Input
) - outputs (
wdlgen.Output
) - calls:
- general call (
wdlgen.WorkflowCall
) - scatter (
wdlgen.WorkflowScatter(WorkflowCall[])
)
- general call (
- meta:
wdlgen.Meta
- parameter_meta:
wdlgen.ParameterMeta
- manual imports (
-
Task creation (
wdlgen.Task
) - This is based similar to how CWL constructs its commands.- inputs:
wdlgen.Input
- outputs:
wdlgen.Output
- runtime:
wdlgen.Task.Runtime
- command:
wdlgen.Task.Command
- arguments:
wdlgen.Task.Command.Argument
- inputs:
wdlgen.Task.Command.Input
- arguments:
- meta:
wdlgen.Meta
- parameter_meta:
wdlgen.ParameterMeta
- inputs:
How to use
This will give you a brief overview on how to use python-wdlgen. Goals are to improve the write a proper documentation spec, but if you have a moderate understanding of workflows in either CWL or WDL, this code will hopefully be fairly intuitive.
Every class inherits from a WDLBase
which means it must have a get_string()
method which returns the string representation of the class, calling this on any children it may have.
Types
All types are represented as a WDLType, which has a parse method. It's a little overkill in some cases, but makes managing attributes a bit easier.
parsed_string = wdlgen.WdlType.parse("String") # WdlType<PrimitiveType<String>>
parsed_op_str = wdlgen.WdlType.parse("String?") # WdlType<PrimtiveType<String>>
parsed_array = wdlgen.WDLType.parse("File[]") # WdlType<ArrayType<File>>
parsed_ar_oq = wdlgen.WdlType(parse("Int?[]+")) # WdlType<ArrayType<Int?> (+)>
You can also construct these manually:
parsed_string = WdlType(PrimitiveType("String"))
parsed_op_str = WdlType(PrimtiveType("String", optional=True))
parsed_array = WdlType(ArrayType(WdlType(PrimitiveType("File"))))
parsed_ar_q = WdlType(ArrayType(WdlType(PrimitiveType("Int"), optional=True), requires_multiple=True))
Input / Output
Input: wdlgen.Input(data_type: WdlType, name: str, expression: str = None)
Output: wdlgen.Output(data_type: WdlType, name: str, expression: str = None)
both of which output something like:
{WdlType} {name} [= {expression}]
Task
A task is a collection of Inputs, Outputs and a Command that are identified by a name. Inputs and Outputs are as above. Note that you can use functions such as stdout()
or other for the expression.
If you don't want to play by these rules, don't include any inputs or outputs and just provide your whole string to the initializer for command.
t = wdlgen.Task("task_name")
t.inputs.append(wdlgen.Input(wdlgen.WdlType.parse("String"), "taskGreeting"))
# command in next section
t.outputs.append(wdlgen.Output(wdlgen.WdlType.parse("File"), "standardOut", "stdout()"))
Command
The command is broken up similar to how CWL breaks its command generation up, by itself it has a base command. Each component has a corresponding input (else use the wdlgen.Task.Command.Argument
class), optionality, position, prefix (and whether the value should be separated from prefix; think -o {val}
vs outputDir={val}
) and potentially a default.
Construct a command like the following:
command = wdlgen.Task.Command("echo")
command.inputs.append(wdlgen.Task.Command.CommandInput("taskGreeting", optional=False, position=None, prefix="-a", separate_value_from_prefix=True, default=None))
command.inputs.append(wdlgen.Task.Command.CommandInput("otherInput", optional=True, position=2, prefix="optional-param=", separate_value_from_prefix=False, default=None))
# t is the task
t.command = command
print(command.get_string())
This will result in the following WDL command:
echo \
-a ${taskGreeting} \
${"optional-param=" + otherInput}
Task output:
The combination of the task and command outputs:
version development
task task_name {
input {
String taskGreeting
}
command {
echo \
-a ${taskGreeting} \
${"optional-param=" + otherInput}
}
output {
File standardOut = stdout()
}
}
Workflow
You should have moderate idea of the structure of WDL as there's no cleverness or abstraction done anywhere. Beware: there's also no checking attributes (to see if your inputMap
actually corresponds to inputs).
The structure of a workflow is m
w = wdlgen.Workflow("workflow_name")
w.imports.append(wdlgen.Workflow.WorkflowImport("tool_file", ""))
w.inputs.append(
wdlgen.Input(
wdlgen.WdlType.parse("String"),
"inputGreeting"
)
)
inputs_map = {"taskGreeting": "inputGreeting"}
w.calls.append(wdlgen.WorkflowCall("Q.namspaced_task_identifier", "task_alias", inputs_map))
w.outputs.append(wdlgen.Output(wdlgen.WdlType.parse("File"), "standardOut", "task_alias.standardOut")
Which outputs:
version development
import "tools/tool_file.wdl"
workflow workflow_name {
input {
String inputGreeting
}
call Q.namspaced_task_identifier as task_alias {
input:
taskGreeting=inputGreeting
}
output {
File standardOut = task_alias.standardOut
}
}
Known limitations
I'm not a fan of the string interpolation generation of WDL that this module does. I think trying to build an Abstract syntax tree and then there should be something that convert that into the DSL that WDL uses.
You could also cause syntax errors in generated WDL by providing illegal characters.
Goals
- Improve code-level documentation.
- Increase the testing coverage + quality of unit tests.
- Better represent the WDL spec.
Find an easier distribution / release method - such as PIP.Automate testing and delivery through TravisCI / CircleCI or similar.- Validate each value by WDL's language specifications.
- Add support for structs
Long goals
- Write a documentation site.
- Make classes convert into AST and then into DSL.
Issues and Pull Requests
Feel free to log issues and make pull requests. I make no guarantee to the existence or timeliness of replies.
Links:
- WDL description: https://github.com/openwdl/wdl/blob/master/versions/1.0/SPEC.md
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.