Skip to main content

A papermill implementation to run notebooks inside dataproc serverless

Project description

Paperless

Paperless is a tool that extends the capabilities of Papermill by providing the ability to run Papermill via Google Cloud Dataproc Serverless.

ICON

Overview

Papermill is a powerful tool for parameterizing and executing Jupyter Notebooks. However, by default papermill dosn't support Jupyter Kernel Gateway - it was impossible to run spark notebook vs Google Cloud Dataproc Serverless environment with Papermill tool - this is where Paperless helps.

#A papermill implementation to run notebooks inside dataproc serverless

Paperless bridges the gap between Papermill and Google Cloud Dataproc Serverless interactive mode, allowing you to seamlessly integrate the two and harness the power of serverless execution for your Jupyter Notebooks.

Features

  • Serverless Execution: Run Papermill on Google Cloud Dataproc without managing the underlying infrastructure.

  • Scalability: Leverage the scalability of Google Cloud Dataproc for processing multiple Notebooks concurrently.

  • Cost-Effective: Pay only for the resources you consume during the execution, optimizing costs for your notebook parameterization tasks.

Getting Started

Prerequisites

Before using Paperless, make sure you have the following:

  • A Google Cloud Platform (GCP) project
  • Access to Google Cloud Dataproc Enable the API
  • Papermill installed locally or in your environment

Step 1: Install Google Cloud SDK

To authenticate your application using Application Default Credentials (ADC) with gcloud - If you haven't already installed the Google Cloud SDK, you can download and install it from the Google Cloud SDK documentation.

Step 2: Authenticate with gcloud

Open a terminal and run the following command to authenticate your Google Cloud SDK with your Google Cloud Platform (GCP) account:

gcloud auth login

gcloud auth application-default login 

Step 3: Install Paperless

pip install paperless

Step 4: Create sessionTemplates For Paperless

Parameters and details can be found in GCP Docs.

 gcloud compute instance-templates create paperless-interactive --<extra params...>

You can change parameters as you need based on the jobs needs - check the docs for that.

Step 5: Test Executtion:

Paperless excepts && supports all list or arguments exists in original Papermill package - the minimum needed for testing:

 paperless <input_path> <output_path> ...

An extra parameter that is special for Paperless: --template_name Example:

 paperless ./resources/spark.ipynb ./resources/spark-out.ipynb --template_name paperless-interactive

You're all set, enjoy :)


Local development:

# Create a new directory for your project
git clone https://github.com/Plarium-Repo/paperless.git && cd paperless

# Create a virtual environment
python3 -m venv .venv

# Activate the virtual environment
# On Windows
.venv\Scripts\activate

# On macOS and Linux
source venv/bin/activate

# Install requirements
pip install -r requirements.txt

# Install the command line
python setup.py install 

# Execute example
paperless ./resources/spark.ipynb ./resources/spark-out.ipynb

MIT License

Contribution

Code Of Conduct


Made With Love ( :heart: ) & Respect ( :kneeling_person: ) :israel: :israel:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paperless-1.4.0.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

paperless-1.4.0-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file paperless-1.4.0.tar.gz.

File metadata

  • Download URL: paperless-1.4.0.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for paperless-1.4.0.tar.gz
Algorithm Hash digest
SHA256 b8444609b04c0dccb52f0045e6c03210d7a62f373e672720b785886719d97c20
MD5 0f087d82b30de707b93e48d4a8cb9165
BLAKE2b-256 7d2f0c5da78fe5b38c7ff020eb592aa3a0454e5c4b4c775372357c4635eae88c

See more details on using hashes here.

File details

Details for the file paperless-1.4.0-py3-none-any.whl.

File metadata

  • Download URL: paperless-1.4.0-py3-none-any.whl
  • Upload date:
  • Size: 10.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for paperless-1.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8f72bbecaa3d886828099a2d390dc5a5ceb945a59490c6d59ad16fd56369d63e
MD5 fa3e8b983b78a1c7bdb90b555f194b11
BLAKE2b-256 d4d7863d3d6f7e6bc2d1878ca660ebf9d656a106d420a94e21f22026bf3d2ae9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page