Skip to main content

Identify the bottleneck of your Kedro Pipeline quickly

Project description

kedro-profile

Identify the bottleneck of your Kedro Pipeline quickly with kedro-profile

Example

You will see something similar to this when running the plugin with spaceflight project:

==========Node Summary==========
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Node Name                     ┃ Loading Time(s) ┃ Node Compute Time(s) ┃ Saving Time(s) ┃ Total Time(s) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ preprocess_shuttles_node      │ 1.65            │ 0.01                 │ 0.01           │ 1.68          │
│ create_model_input_table_node │ 0.01            │ 0.03                 │ 0.02           │ 0.06          │
│ preprocess_companies_node     │ 0.01            │ 0.01                 │ 0.02           │ 0.03          │
└───────────────────────────────┴─────────────────┴──────────────────────┴────────────────┴───────────────┘

==========Dataset Summary==========
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Dataset Name           ┃ Loading Time(s) ┃ Load Count ┃ Saving Time(s) ┃ Save Count ┃ Total Time(s) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ preprocessed_shuttles  │ 0.02            │ 1.0        │ 0.01           │ 1.0        │ 0.03          │
│ preprocessed_companies │ 0.0             │ 1.0        │ 0.02           │ 1.0        │ 0.02          │
│ companies              │ 0.01            │ 1.0        │ nan            │ nan        │ nan           │
│ shuttles               │ 1.65            │ 1.0        │ nan            │ nan        │ nan           │
│ reviews                │ 0.01            │ 1.0        │ nan            │ nan        │ nan           │
│ model_input_table      │ nan             │ nan        │ 0.02           │ 1.0        │ nan           │
└────────────────────────┴─────────────────┴────────────┴────────────────┴────────────┴───────────────┘

Requirements

kedro>=0.18.5 # Minimal version for hook specifications
pandas>=1.0.0

Get Started

If you do not have kedro installed already, install kedro with: pip install kedro

Then create an example project with this command: kedro new --example=yes --tools=none --name kedro-profile-example

If you are cloning the repository, the project is already created here

This will create a new directorykedro-profile-example in your current directory.

Enable the Profiling Hook

You will find this line in settings.py, update it as follow:

from kedro_profile.hook import ProfileHook

HOOKS: tuple[ProfileHook] = (
    ProfileHook(
        save_file=True,  # Enable CSV file saving
        node_profile_path="data/08_reporting/profiling/node_profile.csv",
        dataset_profile_path="data/08_reporting/profiling/dataset_profile.csv",
    ),
)

Configuration Options

  • save_file: Boolean to enable/disable CSV file saving (default: False)
  • node_profile_path: Path for node performance CSV file (default: "node_profile.csv")
  • dataset_profile_path: Path for dataset performance CSV file (default: "dataset_profile.csv")
  • env: Environment filter (default: "local")

Example Configurations

Save to custom directory:

HOOKS: tuple[ProfileHook] = (
    ProfileHook(
        save_file=True,
        node_profile_path="reports/node_performance.csv",
        dataset_profile_path="reports/dataset_performance.csv",
    ),
)

Disable CSV saving (console output only):

HOOKS: tuple[ProfileHook] = (
    ProfileHook(save_file=False),
)

Output

The plugin generates two CSV files when save_file=True:

  1. Node Profile: Contains node execution times and performance metrics
  2. Dataset Profile: Contains dataset loading/saving times and access counts

Both files include:

  • Load/Save counts
  • Loading/Saving times
  • Total time calculations
  • Sorted by total time (descending)

Environment Variables

  • KEDRO_PROFILE_DISABLE=1: Disable profiling
  • KEDRO_PROFILE_RICH=0: Disable rich console output

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kedro_profile-0.0.2.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kedro_profile-0.0.2-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file kedro_profile-0.0.2.tar.gz.

File metadata

  • Download URL: kedro_profile-0.0.2.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.12.3 CPython/3.11.0

File hashes

Hashes for kedro_profile-0.0.2.tar.gz
Algorithm Hash digest
SHA256 a2b7cfea6eb8d8648eb569bd641f1b7b9dd97e0bbb1c85f636b9ecf88821cea7
MD5 cc849748a80a63d2ea1f7f33b41c23b4
BLAKE2b-256 9c70b5e72888fc233ec81d4f76b10e0a44ee74e2f5655680887cfd1f70f4eac7

See more details on using hashes here.

File details

Details for the file kedro_profile-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: kedro_profile-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 4.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.12.3 CPython/3.11.0

File hashes

Hashes for kedro_profile-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2edd9025f9b94575fc4880d1d0cd46239b6a5563fe5d2fb439ca65e98860d365
MD5 709c6a9bb768af72c01c38993a2f9f9f
BLAKE2b-256 37135ecd8f11bb939f899d8622350e8039ab83bfc70b36e9102465d450ce3060

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page