Skip to main content

Airflow plugin to execute Jupyter Notebook remotely

Project description

Airflow run Jupyter Notebook Remote

What is it?

architecture

This plugin is designed to allow the execution of Jupyter Notebooks remotely from within an Airflow DAG. By using the plugin, users can integrate and manage Jupyter Notebook workflows as part of their Airflow pipelines, ensuring that data analysis or machine learning code can be orchestrated and run automatically within the DAG scheduling system.

The plugin utilizes the Jupyter API to communicate with a Jupyter server, allowing for operations such as starting a kernel, running notebook cells, and managing sessions. It supports both HTTP requests for session and kernel management and WebSocket connections for sending code to execute inside the notebooks.

Package link: https://pypi.org/project/airflow-remote-jupyter-notebook/

Would you mind buying me a coffee?

If you find this library helpful, consider buying me a coffee! Your support helps maintain and improve the project, allowing me to dedicate more time to developing new features, fixing bugs, and providing updates.

coffee

Dependencies

Installation

Via Pypi Package:

$ pip install airflow-remote-jupyter-notebook

Manually

# run docker-compose to up Airfow and Jupyter Notebook containers
$ docker-compose up

Airfow plugin dependencies

Test dependences

How to contribute

Please report bugs and feature requests at https://github.com/marcelo225/airflow-remote-jupyter-notebook/issues

Credits

Lead Developer - Marcelo Vinicius

Run remote jupyter notebook using Airflow

# in root project folder
$ docker-compose up

Plugin Usage

from jupyter_plugin.plugin import JupyterDAG # <--------- How to import this plugin
from airflow.models import Variable
import datetime

with JupyterDAG(
    'test_dag',     
    jupyter_url=Variable.get('jupyter_url'),
    jupyter_token=Variable.get('jupyter_token'),
    jupyter_base_path=Variable.get('jupyter_base_path'),
    max_active_runs=1,
    default_args={
        'owner': 'Marcelo Vinicius',
        'depends_on_past': False,
        'start_date': datetime.datetime(2021, 1, 1),
        'email_on_failure': False,
        'email_on_retry': False,
        'retries': 2        
    },
    description=f'DAG test to run some remote Jupyter Notebook file.',
    schedule=2,
    catchup=False
) as dag:

    test1 = dag.create_jupyter_remote_operator(task_id="test1", notebook_path=f"notebooks/test1.ipynb")
    test2 = dag.create_jupyter_remote_operator(task_id="test2", notebook_path=f"notebooks/test2.ipynb")
    test3 = dag.create_jupyter_remote_operator(task_id="test3", notebook_path=f"notebooks/test3.ipynb")

test1 >> test2 >> test3
DAG Attributes Description
jupyter_url Jupyter URL server with HTTP or HTTPS
jupyter_token Jupyter Authentication Token
jupyter_base_path Base path where your Jupyter notebooks are stored
Task Creation Explanation
create_jupyter_remote_operator Method from the JupyterDAG class that creates a task to execute a specified Jupyter notebook on a remote server.
task_id A unique identifier for the task, used for tracking and logging within Airflow.
notebook_path Specifies the path to the Jupyter notebook to be executed, relative to the base path.

Run tests

To test the scripts within the Airflow environment, you can use the following command. This will run all tests located in the /home/airflow/tests directory inside the container:

$ docker-compose exec airflow pytest /home/airflow/tests

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file airflow_remote_jupyter_notebook-0.0.3.tar.gz.

File metadata

File hashes

Hashes for airflow_remote_jupyter_notebook-0.0.3.tar.gz
Algorithm Hash digest
SHA256 019118e47f7c07432aa12c5b6fb281eaf8159887d03dcf636890e1aa1e268376
MD5 b76eba7a17bb316fdb2e27629de131ac
BLAKE2b-256 65ccf5807929efea646f754b68e0690912b9dc1d40f0099f1035a16e12878ecb

See more details on using hashes here.

File details

Details for the file airflow_remote_jupyter_notebook-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for airflow_remote_jupyter_notebook-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 b6dfdb3efcf17fd609be89b60d4ea186689b9b18cd20313f6014477411aef47b
MD5 cd823cd6d1619bc1842f0ab837536972
BLAKE2b-256 7ff8277d98296950838f742632dc6218c891bb769b4400c938c8ae1be71d2fd8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page