Skip to main content

Parametrize and run Jupyter and nteract Notebooks

Project description

Build Status image Documentation Status badge badge Code style: black

papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks.

Papermill lets you:

  • parameterize notebooks
  • execute and collect metrics across the notebooks
  • summarize collections of notebooks

This opens up new opportunities for how notebooks can be used. For example:

  • Perhaps you have a financial report that you wish to run with different values on the first or last day of a month or at the beginning or end of the year, using parameters makes this task easier.
  • Do you want to run a notebook and depending on its results, choose a particular notebook to run next? You can now programmatically execute a workflow without having to copy and paste from notebook to notebook manually.
  • Do you have plots and visualizations spread across 10 or more notebooks? Now you can choose which plots to programmatically display a summary collection in a notebook to share with others.

Installation

From the command line:

pip install papermill

For all optional io dependencies, you can specify individual bundles like s3, or azure -- or use all

pip install papermill[all]

Installing In-Notebook bindings

  • Python (included in this repo)
  • R (experimentally available in the papermillr project)

Other language bindings welcome if someone would like to maintain parallel implementations!

Usage

Parameterizing a Notebook

To parameterize your notebook designate a cell with the tag parameters.

enable parameters in Jupyter

Papermill looks for the parameters cell and treats this cell as defaults for the parameters passed in at execution time. Papermill will add a new cell tagged with injected-parameters with input parameters in order to overwrite the values in parameters. If no cell is tagged with parameters the injected cell will be inserted at the top of the notebook.

Additionally, if you rerun notebooks through papermill and it will reuse the injected-parameters cell from the prior run. In this case Papermill will replace the old injected-parameters cell with the new run's inputs.

image

Executing a Notebook

The two ways to execute the notebook with parameters are: (1) through the Python API and (2) through the command line interface.

Execute via the Python API

import papermill as pm

pm.execute_notebook(
   'path/to/input.ipynb',
   'path/to/output.ipynb',
   parameters = dict(alpha=0.6, ratio=0.1)
)

Execute via CLI

Here's an example of a local notebook being executed and output to an Amazon S3 account:

$ papermill local/input.ipynb s3://bkt/output.ipynb -p alpha 0.6 -p l1_ratio 0.1

NOTE: If you use multiple AWS accounts, and you have properly configured your AWS credentials, then you can specify which account to use by setting the AWS_PROFILE environment variable at the command-line. For example:

$ AWS_PROFILE=dev_account papermill local/input.ipynb s3://bkt/output.ipynb -p alpha 0.6 -p l1_ratio 0.1

In the above example, two parameters are set: alpha and l1_ratio using -p (--parameters also works). Parameter values that look like booleans or numbers will be interpreted as such. Here are the different ways users may set parameters:

$ papermill local/input.ipynb s3://bkt/output.ipynb -r version 1.0

Using -r or --parameters_raw, users can set parameters one by one. However, unlike -p, the parameter will remain a string, even if it may be interpreted as a number or boolean.

$ papermill local/input.ipynb s3://bkt/output.ipynb -f parameters.yaml

Using -f or --parameters_file, users can provide a YAML file from which parameter values should be read.

$ papermill local/input.ipynb s3://bkt/output.ipynb -y "
alpha: 0.6
l1_ratio: 0.1"

Using -y or --parameters_yaml, users can directly provide a YAML string containing parameter values.

$ papermill local/input.ipynb s3://bkt/output.ipynb -b YWxwaGE6IDAuNgpsMV9yYXRpbzogMC4xCg==

Using -b or --parameters_base64, users can provide a YAML string, base64-encoded, containing parameter values.

When using YAML to pass arguments, through -y, -b or -f, parameter values can be arrays or dictionaries:

$ papermill local/input.ipynb s3://bkt/output.ipynb -y "
x:
    - 0.0
    - 1.0
    - 2.0
    - 3.0
linear_function:
    slope: 3.0
    intercept: 1.0"

Supported Name Handlers

Papermill supports the following name handlers for input and output paths during execution:

Python In-notebook Bindings

Recording Values to the Notebook

Users can save values to the notebook document to be consumed by other notebooks.

Recording values to be saved with the notebook.

"""notebook.ipynb"""
import papermill as pm

pm.record("hello", "world")
pm.record("number", 123)
pm.record("some_list", [1, 3, 5])
pm.record("some_dict", {"a": 1, "b": 2})

Users can recover those values as a Pandas dataframe via the read_notebook function.

"""summary.ipynb"""
import papermill as pm

nb = pm.read_notebook('notebook.ipynb')
nb.dataframe

image

Displaying Plots and Images Saved by Other Notebooks

Display a matplotlib histogram with the key name matplotlib_hist.

"""notebook.ipynb"""
import papermill as pm
from ggplot import mpg
import matplotlib.pyplot as plt

# turn off interactive plotting to avoid double plotting
plt.ioff()

f = plt.figure()
plt.hist('cty', bins=12, data=mpg)
pm.display('matplotlib_hist', f)

image

Read in that above notebook and display the plot saved at matplotlib_hist.

"""summary.ipynb"""
import papermill as pm

nb = pm.read_notebook('notebook.ipynb')
nb.display_output('matplotlib_hist')

image

Analyzing a Collection of Notebooks

Papermill can read in a directory of notebooks and provides the NotebookCollection interface for operating on them.

"""summary.ipynb"""
import papermill as pm

nbs = pm.read_notebooks('/path/to/results/')

# Show named plot from 'notebook1.ipynb'
# Accept a key or list of keys to plot in order.
nbs.display_output('train_1.ipynb', 'matplotlib_hist')

image

# Dataframe for all notebooks in collection
nbs.dataframe.head(10)

image

Development Guide

Read CONTRIBUTING.md for guidelines on how to setup a local development environment and make code changes back to Papermill.

For development guidelines look in the DEVELOPMENT_GUIDE.md file. This should inform you on how to make particular additions to the code base.

Documentation

We host the Papermill documentation on ReadTheDocs.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

papermill-0.18.2.tar.gz (279.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

papermill-0.18.2-py2.py3-none-any.whl (32.7 kB view details)

Uploaded Python 2Python 3

File details

Details for the file papermill-0.18.2.tar.gz.

File metadata

  • Download URL: papermill-0.18.2.tar.gz
  • Upload date:
  • Size: 279.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.7.3 requests-toolbelt/0.8.0 tqdm/4.29.1 CPython/3.6.3

File hashes

Hashes for papermill-0.18.2.tar.gz
Algorithm Hash digest
SHA256 a383b03f85e430fb261558a59da22ea1b6b0f15497a2d48672a5a590130250e7
MD5 2203e2f98b01850f31fc9826e54e1b4c
BLAKE2b-256 744849a80971858e5b8dd3c732a1de4c174e1470959c1723c97ef2c9f48b24d9

See more details on using hashes here.

File details

Details for the file papermill-0.18.2-py2.py3-none-any.whl.

File metadata

  • Download URL: papermill-0.18.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 32.7 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.7.3 requests-toolbelt/0.8.0 tqdm/4.29.1 CPython/3.6.3

File hashes

Hashes for papermill-0.18.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 0a16965285223e0d48e8baa0c43510ef1eb3f732ee87869744a8cc5013e7e77f
MD5 65dc525f12386c5a469f0b476692de1b
BLAKE2b-256 052056ca11420db42bab9d8eeb1c4c0d1bab34cd417c6e38570bb0932ef0deb6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page