Skip to main content

Neptune.ai integration with Kedro

Project description

Neptune + Kedro Integration

Kedro plugin for experiment tracking and metadata management. It lets you browse, filter and sort runs in a nice UI, visualize node/pipeline metadata, and compare pipelines.

What will you get with this integration?

  • browse, filter, and sort your model training runs
  • compare nodes and pipelines on metrics, visual node outputs, and more
  • display all pipeline metadata including learning curves for metrics, plots, and images, rich media like video and audio or interactive visualizations from Plotly, Altair, or Bokeh
  • and do whatever else you would expect from a modern ML metadata store

image Kedro pipeline metadata in custom dashboard in the Neptune UI

Note: Kedro-Neptune plugin supports distributed pipeline execution and works in Kedro setups that use orchestrators like Airflow or Kubeflow.

Resources

Installation

Before you start, make sure that:

Install neptune-client, kedro, and kedro-neptune

Depending on your operating system open a terminal or CMD and run this command. All required libraries are available via pip and conda:

pip install neptune-client kedro kedro-neptune

For more, see installing neptune-client.

Quickstart

This quickstart will show you how to:

  • Connect Neptune to your Kedro project
  • Log pipeline and dataset metadata to Neptune
  • Add explicit metadata logging to a node in your pipeline
  • Explore logged metadata in the Neptune UI.

Before you start

Step 1: Create a Kedro project from "pandas-iris" starter

kedro new --starter=pandas-iris
  • Follow instructions and choose a name for your Kedro project. For example, "Great-Kedro-Project"
  • Go to your new Kedro project directory

If everything was set up correctly you should see the following directory structure:

Great-Kedro-Project # Parent directory of the template
├── conf            # Project configuration files
├── data            # Local project data (not committed to version control)
├── docs            # Project documentation
├── logs            # Project output logs (not committed to version control)
├── notebooks       # Project related Jupyter notebooks (can be used for experimental code before moving the code to src)
├── README.md       # Project README
├── setup.cfg       # Configuration options for `pytest` when doing `kedro test` and for the `isort` utility when doing `kedro lint`
├── src             # Project source code
    ├── great_kedro_project   
        ├── pipelines   
            ├── data_science
                ├── nodes.py
                ├── pipelines.py
                └── ...

You will use nodes.py and pipelines.py files in this quickstart.

Step 2: Initialize kedro-neptune plugin

  • Go to your Kedro project directory and run
kedro neptune init

The command line will ask for your Neptune API token

  • Input your Neptune API token:
    • Press enter if it was set to the NEPTUNE_API_TOKEN environment variable
    • Pass a different environment variable to which you set your Neptune API token. For example MY_SPECIAL_NEPTUNE_TOKEN_VARIABLE
    • Pass your Neptune API token as a string

The command line will ask for your Neptune project name

  • Input your Neptune project name:
    • Press enter if it was set to the NEPTUNE_PROJECT environment variable
    • Pass a different environment variable to which you set your Neptune project name. For example MY_SPECIAL_NEPTUNE_PROJECT_VARIABLE
    • Pass your project name as a string in a format WORKSPACE/PROJECT

If everything was set up correctly you should:

  • see the message: "kedro-neptune plugin successfully configured"
  • see three new files in your kedro project:
    • Credentials file:YOUR_KEDRO_PROJECT/conf/local/credentials_neptune.yml
    • Config file:YOUR_KEDRO_PROJECT/conf/base/neptune.yml
    • Catalog file:YOUR_KEDRO_PROJECT/conf/base/neptune_catalog.yml

You can always go to those files and change the initial configuration.

Step 3: Add Neptune logging to a Kedro node

  • Go to a pipeline node src/KEDRO_PROJECT/pipelines/data_science/nodes.py
  • Import Neptune client toward the top of the nodes.py
import neptune.new as neptune
  • Add neptune_run argument of type neptune.handler.Handler to the report_accuracy function
def report_accuracy(predictions: np.ndarray, test_y: pd.DataFrame, 
                    neptune_run: neptune.handler.Handler) -> None:
...

You can treat neptune_run like a normal Neptune Run and log any ML metadata to it.

Important
You have to use a special string "neptune_run" to use the Neptune Run handler in Kedro pipelines.

  • Log metrics like accuracy to neptune_run
def report_accuracy(predictions: np.ndarray, test_y: pd.DataFrame, 
                    neptune_run: neptune.handler.Handler) -> None:
    target = np.argmax(test_y.to_numpy(), axis=1)
    accuracy = np.sum(predictions == target) / target.shape[0]
    
    neptune_run['nodes/report/accuracy'] = accuracy * 100

You can log metadata from any node to any Neptune namespace you want.

  • Log images like a confusion matrix to neptune_run
def report_accuracy(predictions: np.ndarray, test_y: pd.DataFrame, 
                    neptune_run: neptune.handler.Handler) -> None:
    target = np.argmax(test_y.to_numpy(), axis=1)
    accuracy = np.sum(predictions == target) / target.shape[0]
    
    fig, ax = plt.subplots()
    plot_confusion_matrix(target, predictions, ax=ax)
    neptune_run['nodes/report/confusion_matrix'].upload(fig)

Note
You can log metrics, text, images, video, interactive visualizations, and more.
See a full list of What you can log and display in Neptune.

Step 4: Add Neptune Run handler to the Kedro pipeline

  • Go to a pipeline definition, src/KEDRO_PROJECT/pipelines/data_science/pipelines.py
  • Add neptune_run Run handler as an input to the report node
node(
    report_accuracy,
    ["example_predictions", "example_test_y", "neptune_run"],
    None,
    name="report"),

Step 5: Run Kedro pipeline

Go to your console and execute your Kedro pipeline

kedro run

A link to the Neptune Run associated with the Kedro pipeline execution will be printed to the console.

Step 6: Explore results in the Neptune UI

  • Click on the Neptune Run link in your console or use an example link

https://app.neptune.ai/common/kedro-integration/e/KED-632

Default Kedro namespace in Neptune UI

  • See pipeline and node parameters in kedro/catalog/parameters

Pipeline parameters logged from Kedro to Neptune UI

  • See execution parameters in kedro/run_params

Execution parameters logged from Kedro to Neptune UI

  • See metadata about the datasets in kedro/catalog/datasets/example_iris_data

Dataset metadata logged from Kedro to Neptune UI

  • See the metrics (accuracy) you logged explicitly in the kedro/nodes/report/accuracy

Metrics logged from Kedro to Neptune UI

  • See charts (confusion matrix) you logged explicitly in the kedro/nodes/report/confusion_matrix

Confusion matrix logged from Kedro to Neptune UI

Support

If you got stuck or simply want to talk to us, here are your options:

  • Check our FAQ page
  • You can submit bug reports, feature requests, or contributions directly to the repository.
  • Chat! When in the Neptune application click on the blue message icon in the bottom-right corner and send a message. A real person will talk to you ASAP (typically very ASAP),
  • You can just shoot us an email at support@neptune.ai

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kedro-neptune-0.1.0.tar.gz (38.2 kB view details)

Uploaded Source

File details

Details for the file kedro-neptune-0.1.0.tar.gz.

File metadata

  • Download URL: kedro-neptune-0.1.0.tar.gz
  • Upload date:
  • Size: 38.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.13

File hashes

Hashes for kedro-neptune-0.1.0.tar.gz
Algorithm Hash digest
SHA256 09c5536a6295104497fc2e72fb3eb42575f120c1656bab140b86b14e15d657b3
MD5 4679b1cf552edfb1ffb738054d698ac5
BLAKE2b-256 1f19fdb7ea4a811682939ac57dc074348387e8fc834ae574e25f24f32842aa3b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page