Skip to main content

Map Reduce for Notebooks

Project description

Papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks.

The goals for Papermill are:

  • Parametrizing notebooks

  • Executing and collecting metrics across the notebooks

  • Summarizing collections of notebooks

Installation

pip install papermill

Usage

Parameterizing a notebook.

### template.ipynb
# This cell has a "parameters" tag. These values will be overwritten by Papermill.
alpha = 0.5
ratio = 0.1

Recording values to be saved with the notebook.

### template.ipynb
import random
import papermill as pm

rand_value = random.randint(1, 10)
pm.record("random_value", rand_value)
pm.record("foo", "bar")

Displaying outputs to be saved with the notebook.

### template.ipynb
# Import plt and turn off interactive plotting to avoid double plotting.
import papermill as pm
import matplotlib.pyplot as plt; plt.ioff()
from ggplot import mpg

f = plt.figure()
plt.hist('cty', bins=12, data=mpg)
pm.display('matplotlib_hist', f)

Executing a parameterized Jupyter notebook

import papermill as pm

pm.execute_notebook(
    notebook="template.ipynb",
    output="output.ipynb",
    params=dict(alpha=0.1, ratio=0.001)
)

Analyzing a single notebook

### summary.ipynb
import papermill as pm

nb = pm.read_notebook('output.ipynb')
nb.dataframe.head()

# Show named plot from 'output.ipynb'
nb.display_output('matplotlib_hist')

Analyzing a collection of notebooks

### summary.ipynb
import papermill as pm

nbs = pm.read_notebooks('/path/to/results/')

# Show named plot from 'output1.ipynb'
nbs.display_output('output1.ipynb', 'matplotlib_hist')

# Dataframe for all notebooks in collection
df = nbs.dataframe
df.head()

# Show histograms from notebooks with the highest random value.
pivoted_df = df.pivot('key', 'name', 'value').sort_values(by='name')
pivoted_df.head()

nbs.display_output(pivoted_df[:3], 'matplotlib_hist')

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

papermill-0.6.tar.gz (23.5 kB view hashes)

Uploaded Source

Built Distribution

papermill-0.6-py2-none-any.whl (10.3 kB view hashes)

Uploaded Python 2

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page