Skip to main content

Watch for idle GPUs and run your jobs: launches jobs in tmux, keeps logs/status and sends start/finish emails..

Project description

GPUSitter

Watch for idle GPUs and run your jobs: launches jobs in tmux, keeps logs/status and sends start/finish emails..

Features

  • Real-time GPU usage monitoring
  • Command-line interface, easy to integrate into workflows
  • Email notifications
  • Scheduled automatic job running

Dependencies

  • tmux

Installation

pip install gpusitter

Usage

# One job with 1 gpu
gpust --job="python train.py"

# One job with 4 gpus
gpust --job="python train.py:4"

# Two jobs with 1 gpu and 4 gpus respectively
gpust --job="python train.py" --job="python train.py --epoch=12 --lr=-.001:4"

After starting your job, you can monitor its progress using tmux.

# List all running tmux sessions
tmux ls

# Attach to your job session (replace GPUSitter_xxx_xx with your session name)
tmux a -t GPUSitter_xxx_xx

Parameter description:

class ConfigData:
    """Configuration data for GPU Snatcher."""

    gpu_free_memory_ratio_threshold: float
    friendly_min: float
    email_host: str
    email_user: str
    email_pwd: str
    email_sender: str
    email_receivers: list[str]
  • gpu_free_memory_ratio_threshold: The minimum free GPU memory ratio required to consider a GPU available. Only GPUs with free memory above this threshold will be used.
  • friendly_min: Waiting time (in seconds) before allocating GPUs. Helps prevent OOM from previous jobs.
  • email_host: Email server, e.g., smtp.qq.com
  • email_user: Email address
  • email_pwd: SMTP authorization code
  • email_sender: Sender
  • email_receivers: Recipients

Contribution

Issues and pull requests are welcome. Please follow the project's code style guidelines.

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpusitter-2.0.3.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gpusitter-2.0.3-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file gpusitter-2.0.3.tar.gz.

File metadata

  • Download URL: gpusitter-2.0.3.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for gpusitter-2.0.3.tar.gz
Algorithm Hash digest
SHA256 68c67bc229ee7c629e62cb30d8d55f2ce8dfe17d2ab8b7ae2f5126b595f27e76
MD5 411df5b9621bb2a0a4743baea1a2c973
BLAKE2b-256 26981c12fa96e78be9962fe29d8c9b1590a39f4ee8ff7bd2fdc5f3da2cca81bc

See more details on using hashes here.

File details

Details for the file gpusitter-2.0.3-py3-none-any.whl.

File metadata

  • Download URL: gpusitter-2.0.3-py3-none-any.whl
  • Upload date:
  • Size: 10.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for gpusitter-2.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 b4eabf6a4b5c204275564b9cb22259cddd7cd88c778ac3572a4fb7e94f4baefa
MD5 5b0e16cfcbe06fd3e0c86be07b2fee1a
BLAKE2b-256 22894ba02d325e17fd3852eeb7b6ae9731d3447d7afb9f0f2d5e8007d12045d0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page