A fast, asynchronous GPU monitoring tool for multiple machines through SSH
Project description
SSH GPU Monitor 🖥️
A fast, asynchronous GPU monitoring tool that provides real-time status of NVIDIA GPUs across multiple machines through SSH, with support for jump hosts and per-machine credentials.
✨ Features
- Real-time Monitoring: Live updates of GPU status across multiple machines
- Asynchronous Operation: Fast, non-blocking checks using
asyncioandasyncssh - Jump Host Support: Access machines behind a bastion/jump host
- Rich Display: Beautiful terminal UI using the
richlibrary - Flexible Configuration:
- YAML-based configuration
- Per-machine SSH credentials
- Pattern-based target generation
- Robust Error Handling: Graceful handling of network issues and timeouts
🚀 Installation & Usage
Install from PyPI
pip install ssh-gpu-monitor
Run the Monitor
After installation, you can run the monitor in several ways:
# Run using the command-line tool
ssh-gpu-monitor
# Or run as a Python module
python -m ssh_gpu_monitor
# Use a custom config file
ssh-gpu-monitor --config /path/to/your/config.yaml
# Get the default config path
ssh-gpu-monitor --get_config_path
Configuration
- Get the default config path:
ssh-gpu-monitor --get_config_path
- Either:
- Copy the default config to your preferred location and use
--configto specify it - Modify the default config directly
- Copy the default config to your preferred location and use
Example config file:
ssh:
username: "your_username"
key_path: "~/.ssh/id_rsa"
jump_host: "jump.example.com"
timeout: 10
targets:
individual:
- "gpu-server1"
- "gpu-server2"
display:
refresh_rate: 5
📖 Configuration
Basic Structure
ssh:
username: "default_user" # Default username
key_path: "~/.ssh/id_rsa" # Default SSH key
jump_host: "jump.example.com"
timeout: 10 # seconds
targets:
# Individual machines
individual:
- host: "gpu-server1"
username: "different_user" # Optional override
key_path: "~/.ssh/special_key" # Optional override
- "gpu-server2" # Uses default credentials
# Pattern-based groups
patterns:
- prefix: "gpu"
start: 1
end: 30
format: "{prefix}{number:02}" # Results in gpu01, gpu02, etc.
username: "gpu_user" # Optional override
key_path: "~/.ssh/gpu_key" # Optional override
display:
refresh_rate: 5 # seconds
debug:
enabled: false
log_dir: "logs"
log_file: "gpu_checker.log"
log_max_size: 1048576 # 1MB
log_backup_count: 3
Command Line Options
Override any configuration option via command line:
# Enable debug logging
python main.py --debug.enabled
# Override SSH settings
python main.py --ssh.username=other_user --ssh.key_path=~/.ssh/other_key
# Check specific targets
python main.py --targets gpu01 gpu02 special-server
🔧 Advanced Usage
Custom Target Patterns
Generate targets using patterns:
patterns:
- prefix: "compute"
start: 1
end: 100
format: "{prefix}-{number:03d}" # compute-001, compute-002, etc.
Per-Machine Credentials
Specify different credentials for specific machines:
individual:
- host: "special-gpu"
username: "admin"
key_path: "~/.ssh/admin_key"
Debug Logging
Enable detailed logging for troubleshooting:
debug:
enabled: true
log_dir: "logs"
log_file: "debug.log"
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
📝 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
Original Contributors
Originally created as "some awful, brittle code to check GPU status of multiple machines at a given host address through an SSH jumpnode."
Special thanks to:
- @harrygcoppock and @minut1bc for their PRs on v1
- gpuobserver for earlier code concepts
- Stack Overflow answer for SSH connection handling insights
Libraries
- Rich for the beautiful terminal interface
- asyncssh for async SSH support
- PyYAML for configuration management
🔍 Similar Projects
⚠️ Known Issues
- SSH connection might timeout on very slow networks
- Some older NVIDIA drivers might return incompatible XML formats
📊 Roadmap
- Add support for AMD GPUs
- Implement process name filtering
- Add web interface
- Support for custom SSH config files
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ssh_gpu_monitor-1.0.2.tar.gz.
File metadata
- Download URL: ssh_gpu_monitor-1.0.2.tar.gz
- Upload date:
- Size: 15.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c6145b11e8ef3da6cec497afd8b79e56e785236865739ee821e32ce09d01ca6d
|
|
| MD5 |
c96e6ebc1c4c6c66a269278a6888d4b0
|
|
| BLAKE2b-256 |
378823bb0ec566d2fd6918a015eabca21165b660b96132f92aba4f0c20606001
|
File details
Details for the file ssh_gpu_monitor-1.0.2-py3-none-any.whl.
File metadata
- Download URL: ssh_gpu_monitor-1.0.2-py3-none-any.whl
- Upload date:
- Size: 14.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de4b91a1055fd9a6fb286b0128627e05f5403bba02728f44fb4f0c806f54dd7d
|
|
| MD5 |
b22e212ccbc404f6d71f4106c53a066a
|
|
| BLAKE2b-256 |
93444375f4bf297e59a7119208b4c28f2f419dd854eae667db735515e11e9d83
|