Skip to main content

diagnostic packages for profiling and ML experiment management

Project description

google-cloud-mldiagnostics

Overview

The google-cloud-mldiagnostics library is a Python package designed to help engineers and researchers monitor and diagnose machine learning training runs with GCP suite of diagnostic toolings. It provides tools for tracking workload progress, collecting metrics and profiling performance.

Supported Framework

  • jax
    • any versions
  • Other in progress

How to install

Install

Install pypi package link

pip install google-cloud-mldiagnostics

This package does not install libtpu, jax and xprof and expects they are installed separately.

How to use

Monitor training

At the beginning of the training script, create a machine learning run:

from google_cloud_mldiagnostics import machinelearning_run

machinelearning_run(
  name=<run-name>,
  gcs_path="gs://<bucket>",
)

Monitor with on-demand profiling

from google_cloud_mldiagnostics import machinelearning_run

machinelearning_run(
  name=<run-name>,
  gcs_path="gs://<bucket>",
  on_demand_xprof=True
)

Monitor with programmatic profiling

from google_cloud_mldiagnostics import machinelearning_run
from google_cloud_mldiagnostics import xprof

machinelearning_run(
  name=<run-name>,
  gcs_path="gs://<bucket>",
)

xprof = xprof()
xprof.start()
# some code
xprof.stop()

Monitor with predefined metrics

from google_cloud_mldiagnostics import machinelearning_run
from google_cloud_mldiagnostics import metrics
from google_cloud_mldiagnostics import metric_types

machinelearning_run(
  name=<run-name>,
  gcs_path="gs://<bucket>",
)

metrics.record(metric_type.MetricType.LOSS, <value>)

To pair the metric value with the current step:

metrics.record(metric_type.MetricType.LOSS, <value>, step=<step>)

Monitor with customer metrics

from google_cloud_mldiagnostics import machinelearning_run
from google_cloud_mldiagnostics import metrics

machinelearning_run(
  name=<run-name>,
  gcs_path="gs://<bucket>",
)

metrics.record("<my-metric>", <value>)

To pair the metric value with the current step:

metrics.record("<my-metric>", <value>, step=<value>)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

google_cloud_mldiagnostics-0.5.1.tar.gz (29.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

google_cloud_mldiagnostics-0.5.1-py3-none-any.whl (46.7 kB view details)

Uploaded Python 3

File details

Details for the file google_cloud_mldiagnostics-0.5.1.tar.gz.

File metadata

File hashes

Hashes for google_cloud_mldiagnostics-0.5.1.tar.gz
Algorithm Hash digest
SHA256 f990a7cf6f5119aaa218baecd0c4d26189daaace4bfa0d0cc65d54a8e0e023e5
MD5 2257903f3dcd9b2c3ad8daa887b9e71f
BLAKE2b-256 f43e1477174f00f2ce5bb01a0bd4ab5e35f6a396f2c65c46966775ae6ce49ec4

See more details on using hashes here.

File details

Details for the file google_cloud_mldiagnostics-0.5.1-py3-none-any.whl.

File metadata

File hashes

Hashes for google_cloud_mldiagnostics-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d4a5762faf9a5c81d45dfc580411752ebdfdc75a684e7834ab4900832849058a
MD5 1b537b522746f7d993421eb5a33dd58b
BLAKE2b-256 2cd759017928710095a5149450eb111ffc4cea9d2a6bf9951f3ca2c76c264171

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page