Skip to main content

Light-weight Python Computational Pipeline Management

Project description

***************************************
Overview
***************************************


The ruffus module is a lightweight way to add support
for running computational pipelines.

Computational pipelines are often conceptually quite simple, especially
if we breakdown the process into simple stages, or separate **tasks**.

Each stage or **task** in a computational pipeline is represented by a
python function
Each python function can be called in parallel to run multiple **jobs**.

Ruffus was originally designed for use in bioinformatics to analyse multiple
genome
data sets.

***************************************
Documentation
***************************************

Ruffus documentation can be found `here
<http://ruffus.googlecode.com/svn/trunk/doc/html/index.html>`_ ,
with an `introduction and installation notes
<http://ruffus.googlecode.com/svn/trunk/doc/html/Introduction.html>`_ ,
a `short 5 minute tutorial
<http://ruffus.googlecode.com/svn/trunk/doc/html/simple_tutorial.html>`_ and
an `in-depth tutorial
<http://ruffus.googlecode.com/svn/trunk/doc/html/Tutorial.html>`_ .


***************************************
Background
***************************************

The purpose of a pipeline is to determine automatically which parts of a
multi-stage
process needs to be run and in what order in order to reach an objective
("targets")

Computational pipelines, especially for analysing large scientific datasets are
in widespread use.
However, even a conceptually simple series of steps can be difficult to set
up and
to maintain, perhaps because the right tools are not available.

***************************************
Design
***************************************
The ruffus module has the following design goals:

* Simplicity. Can be picked up in 10 minutes
* Elegance
* Lightweight
* Unintrusive
* Flexible/Powerful

***************************************
Features
***************************************

Automatic support for

* Managing dependencies
* Parallel jobs
* Re-starting from arbitrary points, especially after errors
* Display of the pipeline as a flowchart
* Reporting


***************************************
A Simple example
***************************************

Use the **@follows(...)** python decorator before the function definitions::

from ruffus import *
import sys

def first_task():
print "First task"

@follows(first_task)
def second_task():
print "Second task"

@follows(second_task)
def final_task():
print "Final task"




the ``@follows`` decorator indicate that the ``first_task`` function
precedes ``second_task`` in
the pipeline.

********
Usage
********

Each stage or **task** in a computational pipeline is represented by a
python function
Each python function can be called in parallel to run multiple **jobs**.

1. Import module::

import ruffus


1. Annotate functions with python decorators

2. Print dependency graph if you necessary

- For a graphical flowchart in ``jpg``, ``svg``, ``dot``, ``png``,
``ps``, ``gif`` formats::

graph_printout ( open("flowchart.svg", "w"),
"svg",
list_of_target_tasks)

This requires ``dot`` to be installed

- For a text printout of all jobs ::

pipeline_printout(sys.stdout, list_of_target_tasks)


3. Run the pipeline::

pipeline_run(list_of_target_tasks, [list_of_tasks_forced_to_rerun,
multiprocess = N_PARALLEL_JOBS])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

ruffus-1.0.8.zip (3.3 MB view details)

Uploaded Source

ruffus-1.0.8.tar.gz (3.2 MB view details)

Uploaded Source

File details

Details for the file ruffus-1.0.8.zip.

File metadata

  • Download URL: ruffus-1.0.8.zip
  • Upload date:
  • Size: 3.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for ruffus-1.0.8.zip
Algorithm Hash digest
SHA256 9ce3552dacaf40f3227b6b766dfd54e75a91b7ce3ad86b118da60c8ccd9ac273
MD5 259c8be1f5aa7d217aeb78922ae620d9
BLAKE2b-256 5de37835352b9607d5c40d9cb9ae1c3a1aa3c450dbb146c8a0a044633c2ac997

See more details on using hashes here.

File details

Details for the file ruffus-1.0.8.tar.gz.

File metadata

  • Download URL: ruffus-1.0.8.tar.gz
  • Upload date:
  • Size: 3.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for ruffus-1.0.8.tar.gz
Algorithm Hash digest
SHA256 eb654c871b4b68a0d8cbeb01ad9b6e902c53ae3d91514b8d2a6927eb00ee7dcb
MD5 709bb8a0f8e971a879340ec3ab7e048e
BLAKE2b-256 41e8634b5933edf6859f017fb8288ad67b5a76b2ad1d57f4bc218e61db99d6ff

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page