Tool for visualizing Apache Oozie pipelines
This is a tool for visualizing Apache Oozie workflows as data flow pipelines.
Fig. 1: Visual summary of what the tool does.
The tool is a command-line application that ingests imperative description of a workflow in Apache Oozie XML file and converts it to a data pipeline representation in PNG image file. Note that in order for the application to be able to extract the pipeline representation, content of the Oozie XML file has to follow certain conventions (e.g., the names of Oozie action properties that correspond to ports have to follow a convention of being prefixed with “input” or “output” string). See file vipe/oozie/converter/iis.py for a code which follows such conventions used in workflow definitions of OpenAIRE IIS project.
Run pip install vipe to install the stable version of the software from PyPI repository. After installing the software, you can run it by executing vipe-oozie2png (run vipe-oozie2png --help for usage instructions).
Note that the following libraries have to be installed in the system for the tool to work:
There are two main goals of the solution:
This section contains example visualization of various workflows. The visualization were generated with the application version 0.5.
Below we show visualization of Oozie workflow `vipe/oozie/test/data/bypass/workflow.xml <vipe/oozie/test/data/bypass/workflow.xml>`__. Internally, this workflow is converted to OozieGraph representation (see its YAML representation in `vipe/oozie/test/data/bypass/workflow.yaml <vipe/oozie/test/data/bypass/workflow.yaml>`__) and then subsequently to Pipeline representation (see its YAML representation in `vipe/oozie/test/data/bypass/pipeline.yaml <vipe/oozie/test/data/bypass/pipeline.yaml>`__) and then finally to a PNG image.
See Fig. 2-5 for visualizations of the workflow with different levels of details as specified by the user.
Fig. 2: Simple workflow visualized with the lowest level of details.
Fig. 3: Simple workflow visualized with medium level of details.
Fig. 4: Simple workflow visualized with medium level of details with input and output ports shown.
Fig. 5: Simple workflow visualized with the highest level of detail with input and output ports shown.
In this section, we show visualizations generated for real-life workflows from OpenAIRE IIS project - see Fig. 6-8.
Fig. 6: Primary-main workflow from OpenAIRE IIS project with medium level of detail.
Fig. 7: Primary-processing workflow from OpenAIRE IIS project with the lowest level of detail.
Fig. 8: Primary-processing workflow from OpenAIRE IIS project with medium level of detail.
Features visible to the user of the application are listed below. Note that we use a notion of port (see chapter 3 of Gregor Hohpe, Bobby Woolf: “Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions”, Addison-Wesley, 2003) corresponding to a join point between node and connection in a data pipeline graph.
In this section, we describe internal features of the solution that are of interest of people who want to extend its code.
Extensibility areas. The application was designed and implemented with extensibility in mind - we wanted to make it easily extensible in the following areas.
Processing stages. In order to attain mentioned extensibility goals, the processing in the application was separated into stages shown in Fig. 9.
Fig. 9: Data processing in the application. Boxes correspond to data structures or files while the arrows correspond to processing steps. The area enclosed with dotted line shows discussed potential future extensions of the application. Names highlighted in gray correspond to names of classes in the source code.
Intermediate representations. It is worth noting that there are two intermediate representations of the workflow (as shown in Fig. 9):
A PipelineConverter-derived class is used to translate OozieGraph into Pipeline.
Python packages that the application depends on are listed in the requirements.txt file. Note that the project is written in Python 3, so you need to install Python 3 version of these dependencies (on Ubuntu 14.04 system you can do it by executing, e.g. sudo pip3 install pytest).
The docstrings in the code follow Google style guide with types declared in accordance to Sphinx’s type annotating conventions. Note that you have to use Sphinx version at least 1.3 if you want to generate documentation with type annotations.
Possible future extensions of the application are listed below.
The code is licensed under Apache License, Version 2.0
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|File Name & Checksum SHA256 Checksum Help||Version||File Type||Upload Date|
|vipe-0.5.3-py2.py3-none-any.whl (240.7 kB) Copy SHA256 Checksum SHA256||3.4||Wheel||Feb 15, 2016|
|vipe-0.5.3-py3.4.egg (244.9 kB) Copy SHA256 Checksum SHA256||3.4||Egg||Feb 15, 2016|
|vipe-0.5.3.tar.gz (146.5 kB) Copy SHA256 Checksum SHA256||–||Source||Feb 15, 2016|