Skip to main content

Storage Workflows for Notebooks

Project description

depository

This repository provides tooling and workflow recommendations for storing, scheduling, and publishing notebooks.

Automatic Notebook Versioning

Every save of a notebook creates an immutable copy of the notebook on object storage.

To ease implementation, we'll rely on S3 as the object store, using versioned buckets.

Storage Paths

All notebooks are archived to a single versioned S3 bucket with specific prefixes denoting the lifecycle of the notebook:

  • /workspace - where users edit
  • /scheduled - notebooks currently scheduled
  • /published - public notebooks (to an organization)

Each notebook path is a namespace that an external service ties into the schedule. We archive off versions, keeping the path intact (until a user changes them).

Prefix Intent
/workspace/kylek/notebooks/mine.ipynb Notebook in “draft”
/scheduled/kylek/notebooks/mine.ipynb Current scheduled copy
/published/kylek/notebooks/mine.ipynb Current published copy

Transitioning to this Storage Plan

Since most people are on a regular filesystem, we'll start with writing to the /workspace prefix as Archival Storage (writing on save using a post_save_hook for a Jupyter contents manager).

Configuration

from depository import depositoryContentsArchiver

# jupyter config
# At ~/.jupyter/jupyter_notebook_config.py for user installs
# At __ for system installs
c = get_config()

c.NotebookApp.contents_manager_class = depositoryContentsArchiver

c.depository.workspace_prefix = "/workspace/kylek/notebooks"
c.depository.published_prefix = "/published/kylek/notebooks"  
c.depository.scheduled_prefix = "/scheduled/kylek/notebooks"  

# Optional, in case you're using a different contents manager
# This defaults to notebook.services.contents.manager.ContentsManager
# c.depository.Archiver.underlying_contents_manager_class = ADifferentContentsManager

c.depository.Backend = "s3"
c.depository.S3.bucket = "<bucket-name>"

# Note: if depository is used from an EC2 instance with the right IAM role, you don't
# have to specify these
c.depository.S3.access_key_id = <AWS Access Key ID / IAM Access Key ID>
c.depository.S3.secret_access_key = <AWS Secret Access Key / IAM Secret Access Key>

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

depository-0.1.tar.gz (23.5 kB view details)

Uploaded Source

File details

Details for the file depository-0.1.tar.gz.

File metadata

  • Download URL: depository-0.1.tar.gz
  • Upload date:
  • Size: 23.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.19.9 CPython/2.7.15

File hashes

Hashes for depository-0.1.tar.gz
Algorithm Hash digest
SHA256 ee0c31744ad9690e59c3c13477756b2e29a9fccefe4b546f1da377d5a6a3e18b
MD5 3198c50d9995866cb8a52017fad7ba92
BLAKE2b-256 338e10509e75c9315e26ddd752cfbcd7096637883be25f407dd04ee4b653ee1b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page