A fast, asynchronous GPU monitoring tool for multiple machines through SSH

These details have not been verified by PyPI

Project links

Development Status
- 4 - Beta
Environment
- Console
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
Topic
- System :: Monitoring
- System :: Systems Administration

Project description

SSH GPU Monitor 🖥️

A fast, asynchronous GPU monitoring tool that provides real-time status of NVIDIA GPUs across multiple machines through SSH, with support for jump hosts and per-machine credentials.

Example Output

✨ Features

Real-time Monitoring: Live updates of GPU status across multiple machines
Asynchronous Operation: Fast, non-blocking checks using asyncio and asyncssh
Jump Host Support: Access machines behind a bastion/jump host
Rich Display: Beautiful terminal UI using the rich library
Flexible Configuration:
- YAML-based configuration
- Per-machine SSH credentials
- Pattern-based target generation
Robust Error Handling: Graceful handling of network issues and timeouts

🚀 Installation & Usage

Install from PyPI

pip install ssh-gpu-monitor

Run the Monitor

After installation, you can run the monitor in several ways:

# Run using the command-line tool
ssh-gpu-monitor

# Or run as a Python module
python -m ssh_gpu_monitor

# Use a custom config file
ssh-gpu-monitor --config /path/to/your/config.yaml

# Get the default config path
ssh-gpu-monitor --get_config_path

Configuration

Get the default config path:

ssh-gpu-monitor --get_config_path

Either:
- Copy the default config to your preferred location and use --config to specify it
- Modify the default config directly

Example config file:

ssh:
  username: "your_username"
  key_path: "~/.ssh/id_rsa"
  jump_host: "jump.example.com"
  timeout: 10

targets:
  individual:
    - "gpu-server1"
    - "gpu-server2"

display:
  refresh_rate: 5

📖 Configuration

Basic Structure

ssh:
  username: "default_user"  # Default username
  key_path: "~/.ssh/id_rsa"  # Default SSH key
  jump_host: "jump.example.com"
  timeout: 10  # seconds

targets:
  # Individual machines
  individual:
    - host: "gpu-server1"
      username: "different_user"  # Optional override
      key_path: "~/.ssh/special_key"  # Optional override
    - "gpu-server2"  # Uses default credentials
  
  # Pattern-based groups
  patterns:
    - prefix: "gpu"
      start: 1
      end: 30
      format: "{prefix}{number:02}"  # Results in gpu01, gpu02, etc.
      username: "gpu_user"  # Optional override
      key_path: "~/.ssh/gpu_key"  # Optional override

display:
  refresh_rate: 5  # seconds

debug:
  enabled: false
  log_dir: "logs"
  log_file: "gpu_checker.log"
  log_max_size: 1048576  # 1MB
  log_backup_count: 3

Command Line Options

Override any configuration option via command line:

# Enable debug logging
python main.py --debug.enabled

# Override SSH settings
python main.py --ssh.username=other_user --ssh.key_path=~/.ssh/other_key

# Check specific targets
python main.py --targets gpu01 gpu02 special-server

🔧 Advanced Usage

Custom Target Patterns

Generate targets using patterns:

patterns:
  - prefix: "compute"
    start: 1
    end: 100
    format: "{prefix}-{number:03d}"  # compute-001, compute-002, etc.

Per-Machine Credentials

Specify different credentials for specific machines:

individual:
  - host: "special-gpu"
    username: "admin"
    key_path: "~/.ssh/admin_key"

Debug Logging

Enable detailed logging for troubleshooting:

debug:
  enabled: true
  log_dir: "logs"
  log_file: "debug.log"

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Original Contributors

Originally created as "some awful, brittle code to check GPU status of multiple machines at a given host address through an SSH jumpnode."

Special thanks to:

@harrygcoppock and @minut1bc for their PRs on v1
gpuobserver for earlier code concepts
Stack Overflow answer for SSH connection handling insights

Libraries

Rich for the beautiful terminal interface
asyncssh for async SSH support
PyYAML for configuration management

🔍 Similar Projects

⚠️ Known Issues

SSH connection might timeout on very slow networks
Some older NVIDIA drivers might return incompatible XML formats

📊 Roadmap

Add support for AMD GPUs
Implement process name filtering
Add web interface
Support for custom SSH config files

Project details

These details have not been verified by PyPI

Project links

Development Status
- 4 - Beta
Environment
- Console
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
Topic
- System :: Monitoring
- System :: Systems Administration

Release history Release notifications | RSS feed

This version

1.0.2

Oct 28, 2024

1.0.1

Oct 28, 2024

1.0.0

Oct 28, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ssh_gpu_monitor-1.0.2.tar.gz (15.2 kB view details)

Uploaded Oct 28, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ssh_gpu_monitor-1.0.2-py3-none-any.whl (14.3 kB view details)

Uploaded Oct 28, 2024 Python 3

File details

Details for the file ssh_gpu_monitor-1.0.2.tar.gz.

File metadata

Download URL: ssh_gpu_monitor-1.0.2.tar.gz
Upload date: Oct 28, 2024
Size: 15.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.9.12

File hashes

Hashes for ssh_gpu_monitor-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`c6145b11e8ef3da6cec497afd8b79e56e785236865739ee821e32ce09d01ca6d`
MD5	`c96e6ebc1c4c6c66a269278a6888d4b0`
BLAKE2b-256	`378823bb0ec566d2fd6918a015eabca21165b660b96132f92aba4f0c20606001`

See more details on using hashes here.

File details

Details for the file ssh_gpu_monitor-1.0.2-py3-none-any.whl.

File metadata

Download URL: ssh_gpu_monitor-1.0.2-py3-none-any.whl
Upload date: Oct 28, 2024
Size: 14.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.9.12

File hashes

Hashes for ssh_gpu_monitor-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`de4b91a1055fd9a6fb286b0128627e05f5403bba02728f44fb4f0c806f54dd7d`
MD5	`b22e212ccbc404f6d71f4106c53a066a`
BLAKE2b-256	`93444375f4bf297e59a7119208b4c28f2f419dd854eae667db735515e11e9d83`

See more details on using hashes here.

ssh-gpu-monitor 1.0.2

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

SSH GPU Monitor 🖥️

✨ Features

🚀 Installation & Usage

Install from PyPI

Run the Monitor

Configuration

📖 Configuration

Basic Structure

Command Line Options

🔧 Advanced Usage

Custom Target Patterns

Per-Machine Credentials

Debug Logging

🤝 Contributing

📝 License

🙏 Acknowledgments

Original Contributors

Libraries

🔍 Similar Projects

⚠️ Known Issues

📊 Roadmap

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes