Skip to main content

JupyterNotebook Flink magics

Project description

Python Version License SemVer PyPI version Downloads

Streaming Jupyter Integrations

Streaming Jupyter Integrations project includes a set of magics for interactively running Flink SQL jobs in Jupyter Notebooks

Installation

In order to actually use these magics, you must install our PIP package along jupyterlab-lsp:

python3 -m pip install jupyterlab-lsp streaming-jupyter-integrations

Usage

Register in Jupyter with a running IPython in the first cell:

%load_ext streaming_jupyter_integrations.magics

Then you need to decide which execution mode and execution target to choose.

%flink_connect --execution-mode [mode] --execution-target [target]

By default, the streaming execution mode and local execution target are used.

%flink_connect

Execution mode

Currently, Flink supports two execution modes: batch and streaming. Please see Flink documentation for more details.

In order to specify execution mode, add --execution-mode parameter, for instance:

%flink_connect --execution-mode batch

Execution target

Streaming Jupyter Integrations supports 3 execution targets:

  • Local
  • Remote
  • YARN Session

Local execution target

Running Flink in local mode will start a MiniCluster in a local JVM with parallelism 1.

In order to run Flink locally, use:

%flink_connect --execution-target local

Alternatively, since the execution target is local by default, use:

%flink_connect

Remote execution target

Running Flink in remote mode will connect to an existing Flink session cluster. Besides specifying --execution-target to be remote, you also need to specify --remote-hostname and --remote-port pointing to Flink Job Manager's REST API address.

%flink_connect \
    --execution-target remote \
    --remote-hostname example.com \
    --remote-port 8888

YARN session execution target

Running Flink in yarn-session mode will connect to an existing Flink session cluster running on YARN. You may specify the hostname and port of the YARN Resource Manager (--resource-manager-hostname and --resource-manager-port). If Resource Manager address is not provided, it is assumed that notebook runs on the same node as Resource Manager. You can also specify YARN applicationId (--yarn-application-id) to which the notebook will connect to. If --yarn-application-id is not specified and there is one YARN application running on the cluster, the notebook will try to connect to it. Otherwise, it will fail.

Connecting to a remote Flink session cluster running on a remote YARN cluster:

%flink_connect \
    --execution-target yarn-session \
    --resource-manager-hostname example.com \
    --resource-manager-port 8888 \
    --yarn-application-id application_1666172784500_0001

Connecting to a Flink session cluster running on a YARN cluster:

%flink_connect \
    --execution-target yarn-session \
    --yarn-application-id application_1666172784500_0001

Connecting to a Flink session cluster running on a dedicated YARN cluster:

%flink_connect --execution-target yarn-session

Variables

Magics allow for dynamic variable substitution in Flink SQL cells.

my_variable = 1
SELECT * FROM some_table WHERE product_id = {my_variable}

Moreover, you can mark sensitive variables like password so they will be read from environment variables or user input every time one runs the cell:

CREATE TABLE MyUserTable (
  id BIGINT,
  name STRING,
  age INT,
  status BOOLEAN,
  PRIMARY KEY (id) NOT ENFORCED
) WITH (
   'connector' = 'jdbc',
   'url' = 'jdbc:mysql://localhost:3306/mydatabase',
   'table-name' = 'users',
   'username' = '${my_username}',
   'password' = '${my_password}'
);

Local development

Note: You will need NodeJS to build the extension package.

The jlpm command is JupyterLab's pinned version of yarn that is installed with JupyterLab. You may use yarn or npm in lieu of jlpm below. In order to use jlpm, you have to have jupyterlab installed (e.g., by brew install jupyterlab, if you use Homebrew as your package manager).

# Clone the repo to your local environment
# Change directory to the flink_sql_lsp_extension directory
# Install package in development mode
pip install -e .
# Link your development version of the extension with JupyterLab
jupyter labextension develop . --overwrite
# Rebuild extension Typescript source after making changes
jlpm build

The project uses pre-commit hooks to ensure code quality, mostly by linting. To use it, install pre-commit and then run

pre-commit install --install-hooks

From that moment, it will lint the files you have modified on every commit attempt.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

streaming_jupyter_integrations-0.10.0.tar.gz (151.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

streaming_jupyter_integrations-0.10.0-py3-none-any.whl (37.5 kB view details)

Uploaded Python 3

File details

Details for the file streaming_jupyter_integrations-0.10.0.tar.gz.

File metadata

File hashes

Hashes for streaming_jupyter_integrations-0.10.0.tar.gz
Algorithm Hash digest
SHA256 2eb904314268410072b1bb0e0048260f53668b32733b54a0ec9f9b4611c5240e
MD5 08ad8b4e395a0b54230fc8c7884a39fc
BLAKE2b-256 b3e648b05009565fc920fc3a3b5155a08f1e04d8808c126b66ad8c2d40a67c3b

See more details on using hashes here.

File details

Details for the file streaming_jupyter_integrations-0.10.0-py3-none-any.whl.

File metadata

File hashes

Hashes for streaming_jupyter_integrations-0.10.0-py3-none-any.whl
Algorithm Hash digest
SHA256 54c747d7571856c03aadb4bc7ec41a2d8c58acbcf229eb39e7922c80a3ae1923
MD5 75f7532a21b66733e3df0299949c04fa
BLAKE2b-256 b85e03c2596b21b112a84fde6d1c7f110319ecd0bcc7fb16496386dae5fdc3ff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page