A minimalistic pipeline engine
Project description
Jterator
========
A minimalistic pipeline engine with motivation to be flexible, but uncomplicated enough, so that no GUI is required to work with it.
* use simple format like JSON for basic parametrization whenever possible
** don't be afraid to keep your input settings on the disk
* for performance heavy IO use HDF5
* used scientific data analysis.
* UNIX pipeline is a one great idea, but it is mostly restricted to text processing
Modules
-------
Think of your pipeline as a sequence of connected modules (a linked list). Each module is a program that get and reads JSON from on STDIN file descriptor.
Such JSON contains all the input settings required by the module.
Getting started
---------------
Each pipeline has the following layout on the disk:
* **handles** folder contains all the JSON handles files, the are passed as STDIN to *modules*.
* **modules** folder contains all the executable plus code for programs corresponding
* **logs** folder contains all the output from STDOU and STERR streams, obtain for each executable that has been executed.
* **data** folder contains all the heavy data output like HDF5, etc. These data are shared between modules.
Jterator allows only very simplistic type of workflow - *pipeline* (somewhat similar to a UNIX-world pipeline). Description of such workflow must be put sibling to the folder structure described about, i.e. inside the Jterator pipeline (project) folder. Recognizable file name must be one of **['JteratorPipe.json', 'jt.pipe']**. Description is a JSON format.
```json
{
"name": "Foo",
"version": "0.0.1",
"pipeline": [
{
"name": "Bar",
"module": "bar",
"handles": "handles/some_bar"
}, {
"name": "Baz",
"module": "baz",
"handles": "handles/some_baz"
}
],
"tests": [
{
"type": "hdf5_dependency",
"module_name": "baz",
"input": ["bar"],
"output": ["baz"]
}
]
}
```
To *run* your first pipeline, do:
```bash
cd /my/first/jterator/pipeline/folder && jt run
```
Developing new modules
======================
This is a small walk-through on how to develop a new module for *Jterator*. Each module as to follow a particular convention of processing input parameters. It can be written in virtually any programming language as long as such language can provide tools for working with *JSON* and *HDF5* data formats.
Developing Jterator
===================
Latest code is available at https://github.com/ewiger/Jterator
Nose tests
----------
We use nose framework to achieve code coverage with unit tests. In order to run tests, do
```bash
cd tests && nosetests
```
========
A minimalistic pipeline engine with motivation to be flexible, but uncomplicated enough, so that no GUI is required to work with it.
* use simple format like JSON for basic parametrization whenever possible
** don't be afraid to keep your input settings on the disk
* for performance heavy IO use HDF5
* used scientific data analysis.
* UNIX pipeline is a one great idea, but it is mostly restricted to text processing
Modules
-------
Think of your pipeline as a sequence of connected modules (a linked list). Each module is a program that get and reads JSON from on STDIN file descriptor.
Such JSON contains all the input settings required by the module.
Getting started
---------------
Each pipeline has the following layout on the disk:
* **handles** folder contains all the JSON handles files, the are passed as STDIN to *modules*.
* **modules** folder contains all the executable plus code for programs corresponding
* **logs** folder contains all the output from STDOU and STERR streams, obtain for each executable that has been executed.
* **data** folder contains all the heavy data output like HDF5, etc. These data are shared between modules.
Jterator allows only very simplistic type of workflow - *pipeline* (somewhat similar to a UNIX-world pipeline). Description of such workflow must be put sibling to the folder structure described about, i.e. inside the Jterator pipeline (project) folder. Recognizable file name must be one of **['JteratorPipe.json', 'jt.pipe']**. Description is a JSON format.
```json
{
"name": "Foo",
"version": "0.0.1",
"pipeline": [
{
"name": "Bar",
"module": "bar",
"handles": "handles/some_bar"
}, {
"name": "Baz",
"module": "baz",
"handles": "handles/some_baz"
}
],
"tests": [
{
"type": "hdf5_dependency",
"module_name": "baz",
"input": ["bar"],
"output": ["baz"]
}
]
}
```
To *run* your first pipeline, do:
```bash
cd /my/first/jterator/pipeline/folder && jt run
```
Developing new modules
======================
This is a small walk-through on how to develop a new module for *Jterator*. Each module as to follow a particular convention of processing input parameters. It can be written in virtually any programming language as long as such language can provide tools for working with *JSON* and *HDF5* data formats.
Developing Jterator
===================
Latest code is available at https://github.com/ewiger/Jterator
Nose tests
----------
We use nose framework to achieve code coverage with unit tests. In order to run tests, do
```bash
cd tests && nosetests
```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Jterator-0.0.1.tar.gz
(8.0 kB
view details)
File details
Details for the file Jterator-0.0.1.tar.gz
.
File metadata
- Download URL: Jterator-0.0.1.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 816c77399a822a13b1997fd7a6630ac84115fa35d0403b88bdb20c343434ede8 |
|
MD5 | 5dcf67a822cb623b4afef02778294a13 |
|
BLAKE2b-256 | 4fd88f2401e4eb490f157344d4b1c8a0e425ec362d472ff0d8a8534305cb51ff |