Workflow Management System
The official cosmos website is hosted at http://cosmos.hms.harvard.edu.
To chat with the author/other users (many of which use Cosmos to make bioinformatics NGS workflows), use gitter:
pip install cosmos-wfm # Optional, recommended for visualizing Workflows: sudo apt-get graphviz graphviz-dev # or brew install graphviz for mac pip install pygraphviz # requires graphviz
Cosmos is a python library for creating scientific pipelines that run on a distributed computing cluster. It is primarily designed and used for bioinformatics pipelines, but is general enough for any type of distributed computing workflow and is also used in fields such as image processing. A Cosmos pipeline can run locally on a single machine or a traditional computing cluster like GridEngine, LSF, Condor, PBS/Torque, SLURM or any other Distributed Resource Manager (DRM) that supports DRMAA. Adding support for other DRMs is very straightforward, and support for AWS Batch is in the works. For those who want to use AWS, it pairs very well with AWS’ new CfnCluster.
Cosmos provides a simple api to specify complex job DAGs, a way to resume modified or failed workflows, uses SQL to store job information, and provides a web dashboard for monitoring and debugging. It is different from libraries such as Luigi or Airflow which are simultaneously trying to solve problems such as scheduling recurring tasks and listening for events. Cosmos is very focused only on reproducible scientific pipelines, allowing it to have a very simple state. There is a single process per Workflow which is a python script, and single process per Task which is a command inside a bash script. When a Task fails, reproducing the exact environment of a Task is as simple as re-running the bash script. Cosmos is intended and useful for both one-off analyses and production software.
Since the original publication, it has been re-written and open-sourced by the original author, in a collaboration between The Lab for Personalized Medicine at Harvard Medical School, the Wall Lab at Stanford University, and Invitae. Invitae is a leading clinical genetic sequencing diagnostics laboratory where Cosmos is deployed in production and processes thousands of samples per month. It is also used by various research groups around the world; if you use it for cool stuff please let us know!
Please use the Github Issue Tracker.
Some pretty big changes here, incurred during a hackathon at Invitae where a lot of feedback and contributions were received. Primarily, the api was simplified and made more intuitive. A new Cosmos primitive was created called a Dependency, which we have found extremely useful for generalizing subworkflow recipes. This API is now considered to be much more stable.