MeinSweeper is a light-weight framework for running experiments on arbitrary compute nodes
Project description
MeinSweeper
MeinSweeper is a lightweight framework for running experiments on arbitrary compute nodes, with built-in support for GPU management and job distribution.
- This is still in alpha, and was written for research
- I.e. expect bugs and smelly code!
Installation
Use the package manager pip to install MeinSweeper:
pip install meinsweeper
Features
- Asynchronous job execution
- Support for multiple node types (SSH and Local)
- Automatic GPU management and allocation
- Retry mechanism for failed jobs and unavailable nodes
- Configurable via environment variables
Usage
Basic Usage
import meinsweeper
targets = {
'local_gpu': {'type': 'local_async', 'params': {'gpus': ['0', '1']}},
'remote_server': {'type': 'ssh', 'params': {'address': 'example.com', 'username': 'user', 'key_path': '/path/to/key'}}
}
commands = [
("python script1.py", "job1"),
("python script2.py", "job2"),
# ... more commands
]
meinsweeper.run_sweep(commands, targets)
Node Types
- Local Async Node: Executes jobs on the local machine, managing GPU allocation.
- SSH Node: Connects to remote machines via SSH, manages GPU allocation, and executes jobs.
Both node types handle GPU checking, allocation, and release automatically.
Configuration
MeinSweeper can be configured using environment variables:
MINIMUM_VRAM
: Minimum free VRAM required for a GPU to be considered available (in GB, default: 8)USAGE_CRITERION
: Maximum GPU utilization for a GPU to be considered available (0-1, default: 0.8)MAX_PROCESSES
: Maximum number of concurrent processes (-1 for no limit, default: -1)RUN_TIMEOUT
: Timeout for each job execution (in seconds, default: 1200)MAX_RETRIES
: Maximum number of retries for failed jobs (default: 3)MEINSWEEPER_RETRY_INTERVAL
: Interval between retrying unavailable nodes (in seconds, default: 450)MEINSWEEPER_DEBUG
: Enable debug logging (set to 'True' for verbose output)
Example:
export MINIMUM_VRAM=10
export USAGE_CRITERION=0.5
export MEINSWEEPER_RETRY_INTERVAL=300
python your_script.py
Advanced Usage
Custom Node Types
You can create custom node types by subclassing the ComputeNode
abstract base class:
from meinsweeper.modules.nodes.abstract import ComputeNode
class MyCustomNode(ComputeNode):
async def open_connection(self):
# Implementation
async def run(self, command, label):
# Implementation
# Usage
targets = {
'custom_node': {'type': 'my_custom_node', 'params': {...}}
}
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
meinsweeper-0.3.1.tar.gz
(37.1 kB
view hashes)
Built Distribution
Close
Hashes for meinsweeper-0.3.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 46151e7fe0c2bff603ee6aaa206f54c736a5a8903046deb1aeecd7f804dea035 |
|
MD5 | 7cd0d7e146237254e0cdb10b59d59444 |
|
BLAKE2b-256 | ac07cac69064f2a242f66738ebe1dc9d7e0aec10b5a5654531c00431979d7ee8 |