Skip to main content

Map Reduce for Notebooks

Project description

https://travis-ci.org/nteract/papermill.svg?branch=master https://codecov.io/github/nteract/papermill/coverage.svg?branch=master

Papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks.

Papermill lets you:

  • parametrize notebooks

  • execute and collect metrics across the notebooks

  • summarize collections of notebooks

This opens up new opportunities for how notebooks can be used. For example:

  • Perhaps you have a financial report that you wish to run with different values on the first or last day of a month or at the beginning or end of the year, using parameters makes this task easier.

  • Do you want to run a notebook and depending on its results, choose a particular notebook to run next? You can now programmatically execute a workflow without having to copy and paste from notebook to notebook manually.

  • Do you have plots and visualizations spread across 10 or more notebooks? Now you can choose which plots to programmatically display a summary collection in a notebook to share with others.

Installation

From the commmand line:

pip install papermill

Installing In-Notebook bindings

  • Python (included in this repo)

  • R (available in the papermillr project)

Usage

Parametrizing a Notebook

To parametrize your notebook designate a cell with the tag parameters. Papermill looks for the parameters cell and replaces those values with the parameters passed in at execution time.

docs/img/parameters.png

Executing a Notebook

The two ways to execute the notebook with parameters are: (1) through the Python API and (2) through the command line interface.

Execute via the Python API

import papermill as pm

pm.execute_notebook(
   'path/to/input.ipynb',
   'path/to/output.ipynb',
   parameters = dict(alpha=0.6, ratio=0.1)
)

Execute via CLI

Here’s an example of a local notebook being executed and output to an Amazon S3 account:

$ papermill local/input.ipynb s3://bkt/output.ipynb -p alpha 0.6 -p l1_ratio 0.1

Python In-notebook Bindings

Recording Values to the Notebook

Users can save values to the notebook document to be consumed by other notebooks.

Recording values to be saved with the notebook.

"""notebook.ipynb"""
import papermill as pm

pm.record("hello", "world")
pm.record("number", 123)
pm.record("some_list", [1, 3, 5])
pm.record("some_dict", {"a": 1, "b": 2})

Users can recover those values as a Pandas dataframe via the read_notebook function.

"""summary.ipynb"""
import papermill as pm

nb = pm.read_notebook('notebook.ipynb')
nb.dataframe
docs/img/nb_dataframe.png

Displaying Plots and Images Saved by Other Notebooks

Display a matplotlib histogram with the key name matplotlib_hist.

"""notebook.ipynb"""
import papermill as pm
from ggplot import mpg
import matplotlib.pyplot as plt

# turn off interactive plotting to avoid double plotting
plt.ioff()

f = plt.figure()
plt.hist('cty', bins=12, data=mpg)
pm.display('matplotlib_hist', f)
docs/img/matplotlib_hist.png

Read in that above notebook and display the plot saved at matplotlib_hist.

"""summary.ipynb"""
import papermill as pm

nb = pm.read_notebook('notebook.ipynb')
nb.display_output('matplotlib_hist')
docs/img/matplotlib_hist.png

Analyzing a Collection of Notebooks

Papermill can read in a directory of notebooks and provides the NotebookCollection interface for operating on them.

"""summary.ipynb"""
import papermill as pm

nbs = pm.read_notebooks('/path/to/results/')

# Show named plot from 'notebook1.ipynb'
# Accepts a key or list of keys to plot in order.
nbs.display_output('train_1.ipynb', 'matplotlib_hist')
docs/img/matplotlib_hist.png
# Dataframe for all notebooks in collection
nbs.dataframe.head(10)
docs/img/nbs_dataframe.png

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

papermill-0.10.0.tar.gz (35.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

papermill-0.10.0-py3-none-any.whl (22.1 kB view details)

Uploaded Python 3

papermill-0.10.0-py2-none-any.whl (22.1 kB view details)

Uploaded Python 2

File details

Details for the file papermill-0.10.0.tar.gz.

File metadata

  • Download URL: papermill-0.10.0.tar.gz
  • Upload date:
  • Size: 35.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for papermill-0.10.0.tar.gz
Algorithm Hash digest
SHA256 7fb550e80b2f66cc82d903b51250b78100777341d333c948aab926df7585abcb
MD5 1924b90603957206e029ab72bf790b58
BLAKE2b-256 c962020e7a962f2a12daa9b9363d14aab4d6703007a83e39d241dba6b034816d

See more details on using hashes here.

File details

Details for the file papermill-0.10.0-py3-none-any.whl.

File metadata

File hashes

Hashes for papermill-0.10.0-py3-none-any.whl
Algorithm Hash digest
SHA256 febbe84899509ba1bb32734bd1b8d1908aed8ceff0edd3f9b2d1a59cd3e461b9
MD5 fa21bcfded51c6b7ede627909b1ca0a8
BLAKE2b-256 0c3dabc34152a5f77c62cf294a9f0d9c2a6edfaa56179cb534f6c90fd9984244

See more details on using hashes here.

File details

Details for the file papermill-0.10.0-py2-none-any.whl.

File metadata

File hashes

Hashes for papermill-0.10.0-py2-none-any.whl
Algorithm Hash digest
SHA256 61d7d06f01f9061995e8ec4f7557e5a45912e90ed0ec31c3772142fa08e01d4f
MD5 ab12dc71ae35d37b55333312c83b060e
BLAKE2b-256 b00995fe8ce6744403d9678c6d9094113fc9943953451c236e94e73282095599

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page