noworkflow

Supporting infrastructure to run scientific experiments without a scientific workflow management system.

These details have not been verified by PyPI

Project links

Homepage

Project description

The noWorkflow project aims at allowing scientists to benefit from provenance data analysis even when they don’t use a workflow system. Also, the goal is to allow them to avoid using naming conventions to store files originated in previous executions. Currently, when this is not done, the result and intermediate files are overwritten by every new execution of the pipeline.

noWorkflow was developed in Python and it currently is able to capture provenance of Python scripts using Software Engineering techniques such as abstract syntax tree (AST) analysis, reflection, and profiling, to collect provenance without the need of a version control system or any other environment.

Installing and using noWorkflow is simple and easy. Please check our installation and basic usage guidelines below.

Team

The noWorkflow team is composed by researchers from Universidade Federal Fluminense (UFF) in Brazil and New York University (NYU), in the USA.

Vanessa Braganholo (UFF)
Fernando Chirigati (NYU)
Juliana Freire (NYU)
David Koop (NYU)
Leonardo Murta (UFF)
João Felipe Pimentel (UFF)

Publications

[MURTA, L. G. P.; BRAGANHOLO, V.; CHIRIGATI, F. S.; KOOP, D.; FREIRE, J.; noWorkflow: Capturing and Analyzing Provenance of Scripts. In: International Provenance and Annotation Workshop (IPAW), 2014, Cologne, Germany.] (https://github.com/gems-uff/noworkflow/raw/master/docs/ipaw2014.pdf)

Quick Installation

To install noWorkflow, you should follow these basic instructions:

If you have pip, just run:

$ pip install noworkflow[vis]

This installs both noWorkflow and flask. Flask is the requirement of the visualization tool.

If you do not have pip, but already have Git (to clone our repository) and Python:

$ git clone git@github.com:gems-uff/noworkflow.git
$ cd noworkflow/capture
$ ./setup.py install

This installs noWorkflow on your system. You may need to install flask if you want to use our visualization tool.

Basic Usage

noWorkflow is transparent in the sense that it requires neither changes to the script, nor any laborious configuration. Run

now --help

to learn the usage options.

To run noWorkflow with a script called simulation.py with input data data1.dat and data2.dat, you should run

now run -v simulation.py data1.dat data2.dat

The -v option turns the verbose mode on, so that noWorkflow gives you feedback on the steps taken by the tool. The output, in this case, is similar to what follows.

$ now run -v simulation.py data1.dat data2.dat
[now] removing noWorkflow boilerplate
[now] setting up local provenance store
[now] collecting definition provenance
[now]   registering user-defined functions
[now] collecting deployment provenance
[now]   registering environment attributes
[now]   searching for module dependencies
[now]   registering provenance from 703 modules
[now] collecting execution provenance
[now]   executing the script
[now] the execution of trial 1 finished successfully

Each new run produces a different trial that will be stored with a sequential identification number in the relational database.

Verifying the module dependencies is a time consuming step, and scientists can bypass this step by using the -b flag if they know that no library or source code has changed. The current trial then inherits the module dependencies of the previous one.

It is possible to collect more information than what is collected by default, such as variable usages and dependencias. To perform a dynamic program slicing and capture those information, just run

now run -e Tracer simulation.py data1.dat data2.dat

To list all trials, just run

now list

Assuming we run the experiment again and then run , the output would be as follows.

$ now list
[now] trials available in the provenance store:
  Trial 1: simulation.py data1.dat data2.dat
         with code hash aa49daae4ae8084af3602db436e895f08f14aba8
         ran from 2014-03-04 13:10:34.595995 to 2014-03-04 13:11:33.793083
  Trial 2: simulation.py data1.dat data2.dat
         with code hash aa49daae4ae8084af3602db436e895f08f14aba8
         ran from 2014-03-04 17:59:02.917920 to 2014-03-04 18:00:10.383637

To look at details of an specific trial, use

now show

This command has several options, such as -m to show module dependencies; -d to show function definitions; -e to show the environment context; -a to show function activations; and -f to show file accesses.

Running

now show -a 1

would show details of trial 1. Notice that the function name is preceded by the line number where the call was activated.

$ now show -a 1
[now] trial information:
  Id: 1
  Inherited Id: None
  Script: simulation.py
  Code hash: aa49daae4ae8084af3602db436e895f08f14aba8
  Start: 2014-03-04 13:10:34.595995
  Finish: 2014-03-04 13:11:33.793083
[now] this trial has the following function activation graph:
  42: run_simulation (2014-03-04 13:11:30.969055 -
                                2014-03-04 13:11:32.978796)
      Arguments: data_b = 'data2.dat', data_a = 'data1.dat'
      Globals: wait = 2
      Return value: [['0.0', '0.6'], ['1.0', '0.0'], ['1.0', '0.0'],
      ...

To restore files used by trial 1, run

$ now checkout -l -i 1

By default, the checkout command only restores the script used for the trial (“simulation.py”), even when it has imports and read files as input. Use the option “-l” to restore imported modules and the option “-i” to restore input files. The checkout command track the evolution history. By default, subsequent trials are based on the previous Trial (e.g. Trial 2 is based on Trial 1). When you checkout a Trial, the next Trial will be based on the checked out Trial (e.g. Trial 3 based on Trial 1).

The remaining options of noWorkflow are diff, export and vis. The diff option compares two trials, and the export option exports provenance data of a given trial to Prolog facts, so inference queries can be run over the database.

The vis option starts a visualization tool that allows interactive analysis:

$ now vis -b

The visualization tool shows the evolotion history, the trial information, an activation graph. It is also possible to compare different trials in the visualization tool.

We have also a graph visualization implemented in Java, named noWorkflowVis, which connects to noWorkflow database and allows interactive analysis.

IPython Interface

Another way to visualize and query trials is to use IPython notebook. To install IPython notebook, you can run

$ pip install ipython[all]

Then, to run ipython notebook, go to the project directory and execute:

$ ipython notebook

It will start a local webserver where you can create notebooks and run python code.

Before loading anything related to noworkflow on a notebook, you must initialize it:

import noworkflow.now.ipython as nip
nip.init()

After that, you can load either the history or a specific trial graph:

In [1]:
  trial = nip.Trial(2) # Loads trial with Id = 2
  trial # Shows trial graph

In [2]:
  history = nip.History() # Loads history
  history # Shows history graph

There are attributes on those objects to change the graph visualization, width, height and filter values. Please, check the documentation by running the following code on ipython notebook

trial?
history?

It is also possible to run prolog queries on IPython notebook. To do so, you will need to install SWI-Prolog with shared libraries and the pyswip module: https://github.com/yuce/pyswip/blob/master/INSTALL

To query a specific trial, you can do:

trial.query("activation(1428, X, _, _, _)")

To check the existing rules, please do:

trial.prolog_rules()

Included Software

Parts of the following software were used by noWorkflow directly or in an adapted form:

Acknowledgements

We would like to thank JetBrains for providing us a license for PyCharm. We also want to thank CNPq, FAPERJ, and the National Science Foundation (CNS-1229185, CNS-1153503, IIS-1142013) for partially supporting this work.

License Terms

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

2.0.1

Jun 15, 2024

2.0.0

Jun 13, 2024

1.12.0

Nov 25, 2019

1.11.2

Apr 12, 2018

1.11.1

Apr 11, 2018

1.11.0

Apr 10, 2018

1.10.4

Mar 25, 2018

1.10.3

Mar 25, 2018

1.10.2

Dec 11, 2017

1.10.1

Dec 11, 2017

1.10.0

Dec 11, 2017

1.9.6

Nov 8, 2017

1.9.5

Aug 28, 2017

1.9.4

Aug 25, 2017

1.9.3

Jul 29, 2017

1.9.1

Jun 6, 2016

1.9.0

Jun 6, 2016

1.8.9

Jun 6, 2016

1.8.8.post1

Jun 1, 2016

1.8.1

May 6, 2016

1.8.0

Apr 26, 2016

1.7.7

Apr 6, 2016

1.7.6

Apr 3, 2016

1.7.5

Apr 3, 2016

1.7.4

Apr 3, 2016

1.7.3

Apr 3, 2016

1.7.2

Apr 3, 2016

1.7.1

Apr 3, 2016

1.7.0

Apr 2, 2016

1.6.1

Mar 23, 2016

1.6.0

Mar 23, 2016

1.5.2

Mar 19, 2016

1.5.1

Mar 19, 2016

1.5.0

Mar 19, 2016

1.4.6

Mar 18, 2016

1.4.5

Mar 16, 2016

1.4.4

Mar 10, 2016

1.4.3

Mar 4, 2016

1.4.2

Mar 1, 2016

1.4.1

Feb 29, 2016

1.4.0

Feb 28, 2016

1.3.0

Feb 28, 2016

1.2.0

Feb 27, 2016

1.1.1

Feb 12, 2016

1.1.0

Feb 11, 2016

1.0.3

Feb 6, 2016

1.0.2

Feb 6, 2016

1.0.1

Feb 6, 2016

1.0.0

Feb 2, 2016

0.15.3

Jan 14, 2016

0.15.2

Jan 14, 2016

0.15.1

Jan 14, 2016

0.15.0

Jan 13, 2016

0.14.0

Jan 12, 2016

0.12.3

Nov 18, 2015

0.12.2

Nov 18, 2015

0.12.1

Nov 10, 2015

0.12.0

Nov 9, 2015

0.11.2

Aug 31, 2015

0.11.1

Aug 14, 2015

0.11.0

Jul 8, 2015

0.10.0

May 31, 2015

0.9.5

May 5, 2015

0.9.4

May 3, 2015

0.9.3

Apr 14, 2015

0.9.2

Apr 12, 2015

0.9.0

Apr 12, 2015

0.7.2

Feb 27, 2015

This version

0.7.1

Feb 27, 2015

0.7.0

Feb 6, 2015

0.6.0

Nov 30, 2014

0.5.1

Nov 30, 2014

0.3.1

Nov 30, 2014

0.3

Nov 30, 2014

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

noworkflow-0.7.1.tar.gz (662.5 kB view details)

Uploaded Feb 27, 2015 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

noworkflow-0.7.1-py2.py3-none-any.whl (1.0 MB view details)

Uploaded Feb 27, 2015 Python 2Python 3

File details

Details for the file noworkflow-0.7.1.tar.gz.

File metadata

Download URL: noworkflow-0.7.1.tar.gz
Upload date: Feb 27, 2015
Size: 662.5 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for noworkflow-0.7.1.tar.gz
Algorithm	Hash digest
SHA256	`dd22afa7926019b32b6d03094d707805cac3bcdc693e0fe2ee4a2eb4adaf35e3`
MD5	`015072bc4c290e8894f5056cd47cafff`
BLAKE2b-256	`cd1364e98818b0fe60200fe1e2e3ef871c1581c473ba9edf7008439a5654b51d`

See more details on using hashes here.

File details

Details for the file noworkflow-0.7.1-py2.py3-none-any.whl.

File metadata

Download URL: noworkflow-0.7.1-py2.py3-none-any.whl
Upload date: Feb 27, 2015
Size: 1.0 MB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No

File hashes

Hashes for noworkflow-0.7.1-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`dadd0816bab694d9e756ae566659919da89e445b3fdac467880d50e71a4cc06e`
MD5	`f3733d247921e6c6124689ec961ba412`
BLAKE2b-256	`8e3a227c752cecae75fb13a6f32eb4e0dd82532c2430461efb55b40315e8f746`

See more details on using hashes here.

noworkflow 0.7.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Team

Publications

Quick Installation

Basic Usage

IPython Interface

Included Software

Acknowledgements

License Terms

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes