diagnostic packages for profiling and ML experiment management
Project description
google-cloud-mldiagnostics
Overview
The google-cloud-mldiagnostics library is a Python package designed to help
engineers and researchers monitor and diagnose machine learning training runs
with GCP suite of diagnostic toolings.
It provides tools for tracking workload progress, collecting metrics and
profiling performance.
Supported Framework
- jax
- any versions
- Other in progress
How to install
Install
Install pypi package link
pip install google-cloud-mldiagnostics
This package does not install libtpu, jax and xprof and expects they are
installed separately.
How to use
Monitor training
At the beginning of the training script, create a machine learning run:
from google_cloud_mldiagnostics import machinelearning_run
machinelearning_run(
name=<run-name>,
gcs_path="gs://<bucket>",
)
Monitor with on-demand profiling
from google_cloud_mldiagnostics import machinelearning_run
machinelearning_run(
name=<run-name>,
gcs_path="gs://<bucket>",
on_demand_xprof=True
)
Monitor with programmatic profiling
from google_cloud_mldiagnostics import machinelearning_run
from google_cloud_mldiagnostics import xprof
machinelearning_run(
name=<run-name>,
gcs_path="gs://<bucket>",
)
xprof = xprof()
xprof.start()
# some code
xprof.stop()
Monitor with predefined metrics
from google_cloud_mldiagnostics import machinelearning_run
from google_cloud_mldiagnostics import metrics
from google_cloud_mldiagnostics import metric_types
machinelearning_run(
name=<run-name>,
gcs_path="gs://<bucket>",
)
metrics.record(metric_type.MetricType.LOSS, <value>)
To pair the metric value with the current step:
metrics.record(metric_type.MetricType.LOSS, <value>, step=<step>)
Monitor with customer metrics
from google_cloud_mldiagnostics import machinelearning_run
from google_cloud_mldiagnostics import metrics
machinelearning_run(
name=<run-name>,
gcs_path="gs://<bucket>",
)
metrics.record("<my-metric>", <value>)
To pair the metric value with the current step:
metrics.record("<my-metric>", <value>, step=<value>)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file google_cloud_mldiagnostics-0.5.1.tar.gz.
File metadata
- Download URL: google_cloud_mldiagnostics-0.5.1.tar.gz
- Upload date:
- Size: 29.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f990a7cf6f5119aaa218baecd0c4d26189daaace4bfa0d0cc65d54a8e0e023e5
|
|
| MD5 |
2257903f3dcd9b2c3ad8daa887b9e71f
|
|
| BLAKE2b-256 |
f43e1477174f00f2ce5bb01a0bd4ab5e35f6a396f2c65c46966775ae6ce49ec4
|
File details
Details for the file google_cloud_mldiagnostics-0.5.1-py3-none-any.whl.
File metadata
- Download URL: google_cloud_mldiagnostics-0.5.1-py3-none-any.whl
- Upload date:
- Size: 46.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4a5762faf9a5c81d45dfc580411752ebdfdc75a684e7834ab4900832849058a
|
|
| MD5 |
1b537b522746f7d993421eb5a33dd58b
|
|
| BLAKE2b-256 |
2cd759017928710095a5149450eb111ffc4cea9d2a6bf9951f3ca2c76c264171
|