Skip to main content

Watch for idle GPUs and run your jobs: launches jobs in tmux, keeps logs/status and sends start/finish emails..

Project description

GPUSitter

Watch for idle GPUs and run your jobs: launches jobs in tmux, keeps logs/status and sends start/finish emails..

Features

  • Real-time GPU usage monitoring
  • Command-line interface, easy to integrate into workflows
  • Email notifications
  • Scheduled automatic job running

Dependencies

  • tmux

Installation

pip install gpusitter

Usage

Make sure the job environment (especially the Python environment) is correctly set up before running gpust. There are two common ways to do this:

  1. Activate your environment before running gpust. e.g. conda activate xxx, source .venv/bin/activate. (If you are using uv to manage environments, you can also run: uv run gpust)
  2. Specify the Python path directly in the job command, e.g. gpust --job="~/myproject/.venv/bin/python train.py"
# One job with 1 gpu
gpust --job="python train.py"

# One job with 4 gpus
gpust --job="python train.py:4"

# Two jobs with 1 gpu and 4 gpus respectively
gpust --job="python train.py" --job="python train.py --epoch=12 --lr=-.001:4"

# With CUDA_VISIBLE_DEVICES env
CUDA_VISIBLE_DEVICES=2 gpust --job="python train.py"

# With different python envs
gpust --job="~/job1/.venv/bin/python train1.py" --job="~/job2/.venv/bin/python train2.py"

After starting your job, you can monitor its progress using tmux.

# List all running tmux sessions
tmux ls

# Attach to your job session (replace GPUSitter_xxx_xx with your session name)
tmux a -t GPUSitter_xxx_xx

Parameter description:

class ConfigData:
    """Configuration data for GPU Snatcher."""

    gpu_free_memory_ratio_threshold: float
    friendly_min: float
    email_host: str
    email_user: str
    email_pwd: str
    email_sender: str
    email_receivers: list[str]
  • gpu_free_memory_ratio_threshold: The minimum free GPU memory ratio required to consider a GPU available. Only GPUs with free memory above this threshold will be used.
  • friendly_min: Waiting time (in seconds) before allocating GPUs. Helps prevent OOM from previous jobs.
  • email_host: Email server, e.g., smtp.qq.com
  • email_user: Email address
  • email_pwd: SMTP authorization code
  • email_sender: Sender
  • email_receivers: Recipients

Contribution

Issues and pull requests are welcome. Please follow the project's code style guidelines.

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpusitter-2.1.0.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gpusitter-2.1.0-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file gpusitter-2.1.0.tar.gz.

File metadata

  • Download URL: gpusitter-2.1.0.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for gpusitter-2.1.0.tar.gz
Algorithm Hash digest
SHA256 d93ad7721c7ad2fe1f63570aeeb5e309b74093c45a3142745d155f7eeaa22d4f
MD5 00ff835cea4904efa9a9077aee34c062
BLAKE2b-256 13b3aef064e3dc11525df50b0af2edbc2d9f92e745806ee214f4d0f70a02a614

See more details on using hashes here.

File details

Details for the file gpusitter-2.1.0-py3-none-any.whl.

File metadata

  • Download URL: gpusitter-2.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for gpusitter-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7fc01a8dd9a1a6af335a6c4e7f17e60d09f3ca4e873c1a8335949d7512d6938a
MD5 f27266f0f00b8b71d7a2888333841c0e
BLAKE2b-256 8e21cc0327f992e856bd1d0ce20a55a0183265fcd7ba38d96645e9d2c087edb1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page