Resource-Aware Data systems Tracker (radT) for automatically tracking and training machine learning software

These details have not been verified by PyPI

Project links

Home

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

radT

radT (Resource Aware Data science Tracker) is an extension to MLFlow that simplifies the collection and exploration of hardware metrics of machine learning and deep learning applications. Usually, collecting and processing all the required metrics for these workloads is a hassle. In contrast, RADT is easy to deploy and use, with minimal impact on both performance and time investment. The codebase of RADT is documented and easily expandable.

This work has been published at the SIGMOD workshop DEEM 2023: https://itu-dasyalab.github.io/RAD/publication/papers/DEEM2023.pdf

pip install radt

Features

Wide configuration support including collocation
Track hardware and software metrics
Handle continuous streams of data
Support multiple visualization use-cases
Filter large amounts of inconsequential data
Minimal code impact

Sample usage & getting started

Replace python in your training script by radt, e.g.:

>>> radt train.py --batch-size 256

or, when using virtual environments/conda:

>>> python -m radt train.py --batch-size 256

For a complete getting started guide and examples please visit the Examples.

Easy to use via automated tracking

radT will automatically track hardware metrics for your application. The listeners will start tracking your application on invocation.

As radT extends MLFlow, you can either use the advanced tracking or use MLFlow to track software metrics (e.g. loss).

Advanced tracking options via context

If you want to have more control over what is logged, you can encapsulate your training loop in the RADT context:

from radtrun import RADT

with RADT as run:
  # training loop

CSV syntax for larger experiments

RADT can take the hassle of large experiments off you by training multiple models in succession. Models can even be trained at the same time on different gpus or at the same gpu using a range of collocation schemes.

Experiment,Workload,Status,Run,Devices,Collocation,File,Listeners,Params
2,21,,,0,-,../pytorch/cifar10.py,smi+top+dcgmi,batch-size=128
2,21,,,1,-,../pytorch/cifar10.py,smi+top+dcgmi,batch-size=128

When interrupted by any means, a csv experiment can be rescheduled to continue from where it left off.

Supported platforms

Linux

Contributors

Thank You!

Contributions are welcome. (Please add yourself to the list)

Project details

These details have not been verified by PyPI

Project links

Home

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.2.15

Jan 9, 2024

0.2.14

Aug 29, 2023

0.2.13

Aug 29, 2023

0.2.12

Aug 29, 2023

0.2.11

Aug 28, 2023

0.2.10

Aug 24, 2023

0.2.9

Aug 15, 2023

0.2.8

Aug 15, 2023

0.2.7

Aug 14, 2023

0.2.6

Aug 14, 2023

0.2.5

Aug 14, 2023

0.2.4

Aug 11, 2023

0.2.3

Aug 10, 2023

0.2.2

Aug 10, 2023

0.2.1

Aug 10, 2023

0.2.0

Aug 10, 2023

0.1.4

Jun 15, 2023

This version

0.1.3

Jun 15, 2023

0.1.0

Jun 15, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

radt-0.1.3.tar.gz (10.1 MB view hashes)

Uploaded Jun 15, 2023 Source

Built Distribution

radt-0.1.3-py2.py3-none-any.whl (8.7 kB view hashes)

Uploaded Jun 15, 2023 Python 2 Python 3

Hashes for radt-0.1.3.tar.gz

Hashes for radt-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`34566fb363d6226f4a14afb846312b1b7242fa4eec20274acc43f646d5bf6ad4`
MD5	`97037848f3d788e51fc150601de3ad2f`
BLAKE2b-256	`ee02ae612431586608bc9c733ec662a37e9c9d6ba8246ad9a696af632e80828b`

Hashes for radt-0.1.3-py2.py3-none-any.whl

Hashes for radt-0.1.3-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`50955f053c2d35f7f6f3b9c7bfa35502f8d1b9c5ea7a8ef36d178b8057f5f894`
MD5	`0beb2d7ba083b60ceed1f6bc477fb570`
BLAKE2b-256	`0895592ecf6ee63e5ea27f4038fa07fc8fc7a6a61e26fb595ab6ca4393367008`