Skip to main content

A Dask Scheduler Plugin to monitor GPU memory usage for tasks

Project description

Dask Memory Usage Plugin for GPUs

Continuous TestsInterrogateCoverage

If you familiar with dask-memusage plugin, this is an alternative version to profile GPU(s) memory usage by using nvidia-smi command and its XML output.

This plugin is a low impact memory tracker, but if you need something more advanced, check Scalene and other profilers. For pros and cons of this plugin, see FAQ file.

Code Example

Import some blobs and machine learning models. Also import the Dask Client to connect to the scheduler.

import argparse
import cupy as cp

from cuml.dask.datasets import make_blobs as make_blobs_MGPU
from cuml.dask.cluster import KMeans as KMeans_MGPU

from dask.distributed import Client

Now, we can run the main client. Remember that Dask is preferred to be executed inside the main section. This example uses RAPIDS AI, check if the libraries are proper installed. We strongly recommend to use a pre built container.

def main():
    parser = argparse.ArgumentParser(description="Test KMeans experiment.")

    parser.add_argument('--scheduler-file', type=str, required=True,
                        help='Location of the scheduler file.')

    parser.add_argument('--nodes', type=int, required=True,
                        help='Number of worker nodes.')

    args = parser.parse_args()

    client = Client(scheduler_file=args.scheduler_file)

    client.wait_for_workers(n_workers=args.nodes)

    n_samples = 500000000
    n_bins = 3

    centers = cp.asarray([(-6, -6), (0, 0), (9, 1)])
    X, y = make_blobs_MGPU(n_samples=n_samples, centers=centers, shuffle=False, random_state=42, client=client)

    model = KMeans_MGPU(n_clusters=3, random_state=42, max_iter=100)

    model.fit(X=X)

    pred = model.predict(X=X)

    pred.compute()

    client.close()


if __name__ == '__main__':
    main()

This code can be executed by passing the proper parameters to the command line.

Scheduler CLI usage

To run the scheduler to monitor the GPU memory usage, the scheduler just requires the preloaded plugin module as the example below.

$ dask scheduler --preload dask_memusage_gpus_plugin --memusage-gpus-path memusage-gpus.csv --memusage-gpus-record-type csv --memusage-gpus-max

This plugin also supports other formats like Parquet and Excel for example. There is no problem with workers and threads because Dask CUDA worker only executes 1 thread per GPU.

The results of this execution within the plugin enabled inside the cluster can be seen below.

kmeans

Limitations and Useful Content

For further information hints about this plugin visit the FAQ document.

Authors

  • Julio Faracco

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dask_memusage_gpus-1.0.0.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

dask_memusage_gpus-1.0.0-py2.py3-none-any.whl (9.7 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file dask_memusage_gpus-1.0.0.tar.gz.

File metadata

  • Download URL: dask_memusage_gpus-1.0.0.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for dask_memusage_gpus-1.0.0.tar.gz
Algorithm Hash digest
SHA256 969b013e7f0b693afb41a08915ce980b675290e8790fb0db76bc7435be22cf6c
MD5 fe4e528070c4b47773ce3d5f3e20cc42
BLAKE2b-256 b4e937574631beeee9cfd90c658764b9d7cad61f02a75d5c50960de23e60ed07

See more details on using hashes here.

File details

Details for the file dask_memusage_gpus-1.0.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for dask_memusage_gpus-1.0.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 0e94b6e008159cd5ff21eb80dfdd9c42cf846412171e6b08b40013104c1cb3c7
MD5 1bf0fb3352f815139a2d4cadd01b5469
BLAKE2b-256 7fdbc4da3e11ef40620556706649ac7f1c64a4a52b8e4d1e71c25f8bc30dc7a7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page