nbpipeline

Snakemake-like pipeline manager for reproducible Jupyter Notebooks

These details have not been verified by PyPI

Project links

Homepage

Project description

Snakemake-like pipelines for Jupyter Notebooks, producing interactive pipeline reports like this:

Install & general remarks

These are still early days of this software so please bear in mind that it is not ready for production yet. Note: for simplicity I assume that you are using a recent Ubuntu with git installed.

pip install nbpipeline

Graphiz is required for static SVG plots:

sudo apt-get install graphviz libgraphviz-dev graphviz-dev

Development install

To install the latest development version you may use:

git clone https://github.com/krassowski/nbpipeline
cd nbpipeline
pip install -r requirements.txt
ln -s $(pwd)/nbpipeline/nbpipeline.py ~/bin/nbpipeline

Quickstart

Create pipeline.py file with list of rules for your pipeline. For example:

from nbpipeline.rules import NotebookRule


NotebookRule(
    'Extract protein data',  # a nice name for the step
    input={'protein_data_path': 'data/raw/data_from_wetlab.xlsx'},
    output={'output_path': 'data/clean/protein_levels.csv'},
    notebook='analyses/Data_extraction.ipynb',
    group='Proteomics'  # this is optional
)

NotebookRule(
    'Quality control and PCA on proteins',
    input={'protein_levels_path': 'data/clean/protein_levels.csv'},
    output={'qc_report_path': 'reports/proteins_failing_qc.csv'},
    notebook='analyses/Exploration_and_quality_control.ipynb',
    group='Proteomics'
)

the keys of the input and output variables should correspond to variables in one of the first cells in the corresponding notebook, which should be tagged as “parameters”. It can be done easily in JupyterLab:

If you forget to add them, a warning will be displayed.

Alternativaly, you can create a dedicated cell for input paths definitions and tag it “inputs” and a separate one for output paths definitions, tagging it “outputs”, which allows to omit input and output keywords when creating a NotebookRule. However, only simple variable definitions will be deduced (parsing uses regular expressions to avoid potential dangers of eval).

For more details, please see the example pipeline and notebooks in the examples directory.

Run the pipeline:

nbpipeline

On any consecutive run the notebooks which did not change will not be run again. To disable this cache, use --disable_cache switch.

To generate an interactive diagram of the rules graph, together with reproducibility report add -i switch:

nbpipeline -i

The software defaults to google-chrome for graph visualization display, which can be changed with a CLI option.

If you named your definition files differently (e.g. my_rules.py instead of pipeline.py), use:

nbpipeline --definitions_file my_rules.py

To display all command line options use:

nbpipeline -h

Troubleshooting

If you see ModuleNotFoundError: No module named 'name_of_your_local_module', you may need to enforce the path, running nbpipeline with:

PYTHONPATH=/path/to/the/parent/of/local/module:$PYTHONPATH nbpipeline

Oftentimes the path is the same as the current directory, so the following command may work:

PYTHONPATH=$(pwd):$PYTHONPATH nbpipeline

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.2.7

Apr 19, 2020

0.2.6

Apr 19, 2020

0.2.5

Apr 19, 2020

0.2.4

Apr 19, 2020

0.2.3

Apr 18, 2020

0.2.2

Apr 18, 2020

0.2.1

Apr 1, 2020

0.2.0

Mar 29, 2020

0.1.18

Mar 28, 2020

0.1.17

Jul 17, 2019

0.1.16

Jul 17, 2019

0.1.15

Jul 17, 2019

0.1.14

Jul 17, 2019

0.1.13

Jul 17, 2019

0.1.12

Jul 17, 2019

0.1.11

Jul 17, 2019

0.1.10

Jul 17, 2019

0.1.9

Jul 17, 2019

0.1.8

Jul 17, 2019

0.1.7

Jul 16, 2019

0.1.6

Jul 16, 2019

0.1.5

Jul 16, 2019

0.1.4

Jul 16, 2019

0.1.3

Jul 16, 2019

0.1.2

Jul 16, 2019

0.1.1

Jul 16, 2019

0.1

Jul 16, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nbpipeline-0.2.7.tar.gz (24.1 kB view details)

Uploaded Apr 19, 2020 Source

File details

Details for the file nbpipeline-0.2.7.tar.gz.

File metadata

Download URL: nbpipeline-0.2.7.tar.gz
Upload date: Apr 19, 2020
Size: 24.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.8.1

File hashes

Hashes for nbpipeline-0.2.7.tar.gz
Algorithm	Hash digest
SHA256	`aa46a98994859e3be3c59b47478b9265e4208c75afa7e99e267ff867a7cd8ac2`
MD5	`200eae3da9456670ddf3776bd5219b8f`
BLAKE2b-256	`29ef82eb70d8dee257e7ce446b346e8c27df6f996eafe0363e56d2167d2399bf`

See more details on using hashes here.

nbpipeline 0.2.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Install & general remarks

Development install

Quickstart

Run the pipeline:

Troubleshooting

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes