Airflow plugin to execute Jupyter Notebook remotely
Project description
Airflow run Jupyter Notebook Remote
What is it?
This plugin is designed to allow the execution of Jupyter Notebooks remotely from within an Airflow DAG. By using the plugin, users can integrate and manage Jupyter Notebook workflows as part of their Airflow pipelines, ensuring that data analysis or machine learning code can be orchestrated and run automatically within the DAG scheduling system.
The plugin utilizes the Jupyter API to communicate with a Jupyter server, allowing for operations such as starting a kernel, running notebook cells, and managing sessions. It supports both HTTP requests for session and kernel management and WebSocket connections for sending code to execute inside the notebooks.
Package link: https://pypi.org/project/airflow-remote-jupyter-notebook/
Would you mind buying me a coffee?
If you find this library helpful, consider buying me a coffee! Your support helps maintain and improve the project, allowing me to dedicate more time to developing new features, fixing bugs, and providing updates.
Dependencies
Installation
Via Pypi Package:
$ pip install airflow-remote-jupyter-notebook
Manually
# run docker-compose to up Airfow and Jupyter Notebook containers
$ docker-compose up
Airfow plugin dependencies
- Look at requirements.txt
Test dependences
How to contribute
Please report bugs and feature requests at https://github.com/marcelo225/airflow-remote-jupyter-notebook/issues
Credits
Lead Developer - Marcelo Vinicius
Run remote jupyter notebook using Airflow
# in root project folder
$ docker-compose up
- Open http://localhost:8080 in your web browser to open Airflow
- Open http://localhost:8888 in your web browser to open Jupyter Notebook, when you need it
- Run
test_dag
in Airflow
Plugin Usage
from jupyter_plugin.plugin import JupyterDAG # <--------- How to import this plugin
from airflow.models import Variable
import datetime
with JupyterDAG(
'test_dag',
jupyter_url=Variable.get('jupyter_url'),
jupyter_token=Variable.get('jupyter_token'),
jupyter_base_path=Variable.get('jupyter_base_path'),
max_active_runs=1,
default_args={
'owner': 'Marcelo Vinicius',
'depends_on_past': False,
'start_date': datetime.datetime(2021, 1, 1),
'email_on_failure': False,
'email_on_retry': False,
'retries': 2
},
description=f'DAG test to run some remote Jupyter Notebook file.',
schedule=2,
catchup=False
) as dag:
test1 = dag.create_jupyter_remote_operator(task_id="test1", notebook_path=f"notebooks/test1.ipynb")
test2 = dag.create_jupyter_remote_operator(task_id="test2", notebook_path=f"notebooks/test2.ipynb")
test3 = dag.create_jupyter_remote_operator(task_id="test3", notebook_path=f"notebooks/test3.ipynb")
test1 >> test2 >> test3
DAG Attributes | Description |
---|---|
jupyter_url |
Jupyter URL server with HTTP or HTTPS |
jupyter_token |
Jupyter Authentication Token |
jupyter_base_path |
Base path where your Jupyter notebooks are stored |
Task Creation | Explanation |
---|---|
create_jupyter_remote_operator |
Method from the JupyterDAG class that creates a task to execute a specified Jupyter notebook on a remote server. |
task_id |
A unique identifier for the task, used for tracking and logging within Airflow. |
notebook_path |
Specifies the path to the Jupyter notebook to be executed, relative to the base path. |
Run tests
To test the scripts within the Airflow environment, you can use the following command. This will run all tests located in the /home/airflow/tests directory inside the container:
$ docker-compose exec airflow pytest /home/airflow/tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for airflow_remote_jupyter_notebook-0.0.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 019118e47f7c07432aa12c5b6fb281eaf8159887d03dcf636890e1aa1e268376 |
|
MD5 | b76eba7a17bb316fdb2e27629de131ac |
|
BLAKE2b-256 | 65ccf5807929efea646f754b68e0690912b9dc1d40f0099f1035a16e12878ecb |
Hashes for airflow_remote_jupyter_notebook-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6dfdb3efcf17fd609be89b60d4ea186689b9b18cd20313f6014477411aef47b |
|
MD5 | cd823cd6d1619bc1842f0ab837536972 |
|
BLAKE2b-256 | 7ff8277d98296950838f742632dc6218c891bb769b4400c938c8ae1be71d2fd8 |