Skip to main content

Neptune.ai integration with Kedro

Project description

Kedro-Neptune plugin

Main docs page for Kedro-Neptune plugin

See this example in Neptune Kedro pipeline metadata in custom dashboard in the Neptune UI

What will you get with this integration?

Kedro is a popular open-source project that helps standardize ML workflows. It gives you a clean and powerful pipeline abstraction where you put all your ML code logic.

Kedro-Neptune plugin lets you have all the benefits of a nicely organized kedro pipeline with a powerful user interface built for ML metadata management that lets you:

  • browse, filter, and sort your model training runs
  • compare nodes and pipelines on metrics, visual node outputs, and more
  • display all pipeline metadata including learning curves for metrics, plots, and images, rich media like video and audio or interactive visualizations from Plotly, Altair, or Bokeh
  • and do whatever else you would expect from a modern ML metadata store

Installation

Before you start, make sure that:

Install neptune-client, kedro, and kedro-neptune

Depending on your operating system open a terminal or CMD and run this command. All required libraries are available via pip and conda:

pip install neptune-client kedro kedro-neptune

For more, see installing neptune-client.

Quickstart

​See code examples on GitHub

​See runs logged to Neptune

This quickstart will show you how to:

  • Connect Neptune to your Kedro project
  • Log pipeline and dataset metadata to Neptune
  • Add explicit metadata logging to a node in your pipeline
  • Explore logged metadata in the Neptune UI.

Before you start

Step 1: Create a Kedro project from "pandas-iris" starter

kedro new --starter=pandas-iris
  • Follow instructions and choose a name for your Kedro project. For example, "Great-Kedro-Project"
  • Go to your new Kedro project directory

If everything was set up correctly you should see the following directory structure:

Great-Kedro-Project # Parent directory of the template
├── conf            # Project configuration files
├── data            # Local project data (not committed to version control)
├── docs            # Project documentation
├── logs            # Project output logs (not committed to version control)
├── notebooks       # Project related Jupyter notebooks (can be used for experimental code before moving the code to src)
├── README.md       # Project README
├── setup.cfg       # Configuration options for `pytest` when doing `kedro test` and for the `isort` utility when doing `kedro lint`
├── src             # Project source code
    ├── pipelines   
        ├── data_science
            ├── nodes.py
            ├── pipelines.py
            └── ...

You will use nodes.py and pipelines.py files in this quickstart.

Step 2: Initialize kedro-neptune plugin

  • Go to your Kedro project directory and run
kedro neptune init

The command line will ask for your Neptune API token

  • Input your Neptune API token:
    • Press enter if it was set to the NEPTUNE_API_TOKEN environment variable
    • Pass a different environment variable to which you set your Neptune API token. For example MY_SPECIAL_NEPTUNE_TOKEN_VARIABLE
    • Pass your Neptune API token as a string

The command line will ask for your Neptune project name

  • Input your Neptune project name:
    • Press enter if it was set to the NEPTUNE_PROJECT environment variable
    • Pass a different environment variable to which you set your Neptune project name. For example MY_SPECIAL_NEPTUNE_PROJECT_VARIABLE
    • Pass your project name as a string in a format WORKSPACE/PROJECT

If everything was set up correctly you should:

  • see the message: "kedro-neptune plugin successfully configured"
  • see three new files in your kedro project:
    • Credentials file:YOUR_KEDRO_PROJECT/conf/local/credentials_neptune.yml
    • Config file:YOUR_KEDRO_PROJECT/conf/base/neptune.yml
    • Catalog file:YOUR_KEDRO_PROJECT/conf/base/neptune_catalog.yml

You can always go to those files and change the initial configuration.

Step 3: Add Neptune logging to a Kedro node

  • Go to a pipeline node src/KEDRO_PROJECT/pipelines/data_science/nodes.py
  • Import Neptune client toward the top of the nodes.py
import neptune.new as neptune
  • Add neptune_run argument of type neptune.run.Handler to the report_accuracy function
def report_accuracy(predictions: np.ndarray, test_y: pd.DataFrame, 
                    neptune_run: neptune.run.Handler) -> None:
...

You can treat neptune_run like a normal Neptune Run and log any ML metadata to it.

Important
You have to use a special string "neptune_run" to use the Neptune Run handler in Kedro pipelines.

  • Log metrics like accuracy to neptune_run
def report_accuracy(predictions: np.ndarray, test_y: pd.DataFrame, 
                    neptune_run: neptune.run.Handler) -> None:
    target = np.argmax(test_y.to_numpy(), axis=1)
    accuracy = np.sum(predictions == target) / target.shape[0]
    
    neptune_run['nodes/report/accuracy'] = accuracy * 100

You can log metadata from any node to any Neptune namespace you want.

  • Log images like a confusion matrix to neptune_run
def report_accuracy(predictions: np.ndarray, test_y: pd.DataFrame, 
                    neptune_run: neptune.run.Handler) -> None:
    target = np.argmax(test_y.to_numpy(), axis=1)
    accuracy = np.sum(predictions == target) / target.shape[0]
    
    fig, ax = plt.subplots()
    plot_confusion_matrix(target, predictions, ax=ax)
    neptune_run['nodes/report/confusion_matrix'].upload(fig)

Note
You can log metrics, text, images, video, interactive visualizations, and more.
See a full list of What you can log and display in Neptune.

Step 4: Add Neptune Run handler to the Kedro pipeline

  • Go to a pipeline definition, src/KEDRO_PROJECT/pipelines/data_science/pipelines.py
  • Add neptune_run Run handler as an input to the report node
node(
    report_accuracy,
    ["example_predictions", "example_test_y", "neptune_run"],
    None,
    name="report"),

Step 5: Run Kedro pipeline

Go to your console and execute your Kedro pipeline

kedro run

A link to the Neptune Run associated with the Kedro pipeline execution will be printed to the console.

Step 6: Explore results in the Neptune UI

  • Click on the Neptune Run link in your console or use an example link

https://app.neptune.ai/common/kedro-integration/e/KED-632

Default Kedro namespace in Neptune UI

  • See pipeline and node parameters in kedro/catalog/parameters

Pipeline parameters logged from Kedro to Neptune UI

  • See execution parameters in kedro/run_params

Execution parameters logged from Kedro to Neptune UI

  • See metadata about the datasets in kedro/catalog/datasets/example_iris_data

Dataset metadata logged from Kedro to Neptune UI

  • See the metrics (accuracy) you logged explicitly in the kedro/nodes/report/accuracy

Metrics logged from Kedro to Neptune UI

  • See charts (confusion matrix) you logged explicitly in the kedro/nodes/report/confusion_matrix

Confusion matrix logged from Kedro to Neptune UI

See also

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kedro-neptune-0.0.6.tar.gz (38.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page