Map Reduce for Notebooks

Papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks.

The goals for Papermill are:

  • Parametrizing notebooks
  • Executing and collecting metrics across the notebooks
  • Summarizing collections of notebooks


pip install papermill


Parameterizing a Notebook.

To parameterize your notebook designate a cell with the tag parameters. Papermill looks for the parameters cell and replaces those values with the parameters passed in at execution time.


Executing a Notebook

The two ways to execute the notebook with parameters are through the Python API and through the command line interface.

Executing a Notebook via Python API

import papermill as pm

   parameters=dict(alpha=0.6, ratio=0.1)

Executing a Notebook via CLI

$ papermill local/input.ipynb s3://bkt/output.ipynb -p alpha 0.6 -p l1_ratio 0.1

Recording Values to the Notebook

Users can save values to the notebook document to be consumed by other notebooks.

Recording values to be saved with the notebook.

### notebook.ipynb
import papermill as pm

pm.record("hello", "world")
pm.record("number", 123)
pm.record("some_list", [1,3,5])
pm.record("some_dict", {"a":1, "b":2})

Users can recover those values as a Pandas dataframe via the the read_notebook function.

### summary.ipynb
import papermill as pm

nb = pm.read_notebook('notebook.ipynb')

Displaying Plots and Images Saved by Other Notebooks

Display a matplotlib histogram with the key name “matplotlib_hist”.

### notebook.ipynb
# Import plt and turn off interactive plotting to avoid double plotting.
import papermill as pm
import matplotlib.pyplot as plt; plt.ioff()
from ggplot import mpg

f = plt.figure()
plt.hist('cty', bins=12, data=mpg)
pm.display('matplotlib_hist', f)

Read in that above notebook and display the plot saved at “matplotlib_hist”.

### summary.ipynb
import papermill as pm

nb = pm.read_notebook('notebook.ipynb')

Analyzing a Collection of Notebooks

Papermill can read in a directory of notebooks and provides the NotebookCollection interface for operating on them.

### summary.ipynb
import papermill as pm

nbs = pm.read_notebooks('/path/to/results/')

# Show named plot from 'notebook1.ipynb'
# Accepts a key or list of keys to plot in order.
nbs.display_output('train_1.ipynb', 'matplotlib_hist')
# Dataframe for all notebooks in collection

