Skip to main content

diagnostic packages for profiling and ML experiment management

Project description

google-cloud-mldiagnostics

Overview

The google-cloud-mldiagnostics library is a Python package designed to help engineers and researchers monitor and diagnose machine learning training runs with GCP suite of diagnostic toolings. It provides tools for tracking workload progress, collecting metrics and profiling performance.

Supported Framework

  • jax
    • any versions
  • Other in progress

How to install

Install

Install pypi package link

pip install google-cloud-mldiagnostics

This package does not install libtpu, jax and xprof and expects they will be installed separately.

How to use

Monitor training

At the beginning of the training script create a machine learning run:

from google_cloud_mldiagnostics import machinelearning_run

machinelearning_run(
  name=<run-name>,
  gcs_path="gs://<bucket>",
)

Monitor with on-demand profiling

from google_cloud_mldiagnostics import machinelearning_run

machinelearning_run(
  name=<run-name>
  gcs_path="gs://<bucket>",
  on_demand_xprof=True
)

Monitor with programmatic profiling

from google_cloud_mldiagnostics import machinelearning_run
from google_cloud_mldiagnostics import xprof

machinelearning_run(
  name=<run-name>
  gcs_path="gs://<bucket>",
)

xprof=xprof()
xprof.start()
# some code
xprof.stop()

Monitor with predefined metrics

from google_cloud_mldiagnostics import machinelearning_run
from google_cloud_mldiagnostics import metrics
from google_cloud_mldiagnostics import metric_types

machinelearning_run(
  gcs_path="gs://<bucket>",
)

metrics.record(metric_type.MetricType.LOSS, <value>)

To pair metric value with current step:

metrics.record(metric_type.MetricType.LOSS, <value>, step=<step>)

Monitor with customer metrics

from google_cloud_mldiagnostics import machinelearning_run
from google_cloud_mldiagnostics import metrics

machinelearning_run(
  gcs_path="gs://<bucket>",
)

metrics.record("<my-metric>", <value>)

To pair metric value with current step:

metrics.record("<my-metric>", <value>, step=1)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

google_cloud_mldiagnostics-0.5.0.tar.gz (29.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

google_cloud_mldiagnostics-0.5.0-py3-none-any.whl (46.8 kB view details)

Uploaded Python 3

File details

Details for the file google_cloud_mldiagnostics-0.5.0.tar.gz.

File metadata

File hashes

Hashes for google_cloud_mldiagnostics-0.5.0.tar.gz
Algorithm Hash digest
SHA256 8a28dc1e76834b0118c23d60b750dd58bf5a137617af60a9dded44ca30d4fe4f
MD5 7c0b94ffd3432a8363815c8d3d8179c7
BLAKE2b-256 d7aa10f38ebfb39759491a728da442e59f4e81c2ce79ac5f9f089640a159bfbe

See more details on using hashes here.

File details

Details for the file google_cloud_mldiagnostics-0.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for google_cloud_mldiagnostics-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8892775e186fe7467e4e4450eea70d288f216361afc2d042733ba3d9f020bf49
MD5 01b6985d970a0a5ff0cccc13be3f5981
BLAKE2b-256 d44fbdc2822473643d6decb2efd50589b828bc9a3206f1f491c4b884d05a605f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page