Skip to main content

watchmen for GPU scheduling

Project description

watchmen

A simple and easy-to-use toolkit for GPU scheduling.

Dependencies

  • Python >= 3.6
    • requests >= 2.24.0
    • pydantic >= 1.7.1
    • gpustat >= 0.6.0
    • flask >= 1.1.2
    • apscheduler >= 3.6.3

Installation

  1. Install dependencies.
$ pip install -r requirements.txt
  1. Install watchmen.

Install from source code:

$ pip install -e .

Or you can install the stable version package from pypi.

$ pip install gpu-watchmen -i https://pypi.org/simple

Quick Start

  1. Start the server

The default port of the server is 62333

$ python -m watchmen.server

If you want the server to be running backend, try:

$ nohup python -m watchmen.server &

There are some configurations for the server

usage: server.py [-h] [--host HOST] [--port PORT]
                 [--queue_timeout QUEUE_TIMEOUT]
                 [--request_interval REQUEST_INTERVAL]
                 [--status_queue_keep_time STATUS_QUEUE_KEEP_TIME]

optional arguments:
  -h, --help            show this help message and exit
  --host HOST           host address for api server
  --port PORT           port for api server
  --queue_timeout QUEUE_TIMEOUT
                        timeout for queue waiting (seconds)
  --request_interval REQUEST_INTERVAL
                        interval for gpu status requesting (seconds)
  --status_queue_keep_time STATUS_QUEUE_KEEP_TIME
                        hours for keeping the client status
  1. Modify the source code in your project:
from watchmen import Client

client = Client(id="short description of this running", gpus=[1],
                server_host="127.0.0.1", server_port=62333)
client.wait()

When the program goes on after client.wait(), you are in the queue. You can check examples in example/ for further reading.

$ cd example && python single_card_mnist.py --id="single" --cuda=0 --wait
# queue mode
$ cd example && python multi_card_mnist.py --id="multi" --cuda=2,3 --wait
# schedule mode
$ cd example && python multi_card_mnist.py --id='multi card scheduling wait' --cuda=1,0,3 --req_gpu_num=2 --wait=schedule
  1. Check the queue in browser.

Open the following link to your browser: http://<server ip address>:<server port>, for example: http://192.168.126.143:62333.

And you can get a result like the demo below. Please be aware that the page is not going to change dynamically, so you can refresh the page manually to check the latest status.

New Demo (scheduling mode supported)

Demo

Old Demo (queue mode supported)

Old Demo

  1. Reminder when program is finished.

watchmen also support email and other kinds of reminders for message informing. For example, you can send yourself an email when the program is finished.

from watchmen.reminder import send_email

... # your code here

send_email(
    host="smtp.163.com", # email host to login, like `smtp.163.com`
    port=25, # email port to login, like `25`
    user="***@163.com", # user email address for login, like `***@163.com`
    password="***", # password or auth code for login
    receiver="***@outlook.com", # receiver email address
    html_message="<h1>Your program is finished!</h1>", # content, html format supported
    subject="Proram Finished Notice" # email subject
)

To get more reminders, please check watchmen/reminder.py.

UPDATE

  • v0.3.0: support gpu scheduling, fix blank input output, fix check_gpus_existence
  • v0.2.2: fix html package data, add multi-card example

TODO

  • gpu using stats for each user and process
  • add schedule feature, so clients only have to request for a number and range of gpus, and the server will assign the gpu num to clients
  • add reminders
  • add webui html support
  • add examples

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpu-watchmen-0.3.0.tar.gz (8.2 kB view details)

Uploaded Source

Built Distribution

gpu_watchmen-0.3.0-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file gpu-watchmen-0.3.0.tar.gz.

File metadata

  • Download URL: gpu-watchmen-0.3.0.tar.gz
  • Upload date:
  • Size: 8.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/47.3.1.post20200622 requests-toolbelt/0.9.1 tqdm/4.53.0 CPython/3.7.7

File hashes

Hashes for gpu-watchmen-0.3.0.tar.gz
Algorithm Hash digest
SHA256 a91f2cecbb3c855dcd6a23bead6b1eba80650dbd8986921d926fe8ca0aea4e5f
MD5 f79da217de295240b2c3e03916ce0d35
BLAKE2b-256 ec0396a96546ca633958aa37afbe5f860cfdff3c867dfc0e48a9f6eea72cde70

See more details on using hashes here.

File details

Details for the file gpu_watchmen-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: gpu_watchmen-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/47.3.1.post20200622 requests-toolbelt/0.9.1 tqdm/4.53.0 CPython/3.7.7

File hashes

Hashes for gpu_watchmen-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 279d23585e9a0c60d41cdf29ec052680013b3ad69d2a969f4dec7cfb1b3702b9
MD5 e891d07e576cd0350a27f02296598626
BLAKE2b-256 4c682ee7f67e62046147a1e84ca2a22765a16c577d65b7bcc3141de11fc27ada

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page