MeinSweeper is a light-weight framework for running experiments on arbitrary compute nodes
Project description
MeinSweeper
MeinSweeper is a lightweight framework for running experiments on arbitrary compute nodes, with built-in support for GPU management and job distribution.
- This is still in alpha, and was written for research
- I.e. expect bugs and smelly code!
Installation
Use the package manager pip to install MeinSweeper:
pip install meinsweeper
Features
- Asynchronous job execution
- Support for multiple node types (SSH and Local)
- Automatic GPU management and allocation
- Retry mechanism for failed jobs and unavailable nodes
- Configurable via environment variables
Usage
Basic Usage
import meinsweeper
targets = {
'local_gpu': {'type': 'local_async', 'params': {'gpus': ['0', '1']}},
'remote_server': {'type': 'ssh', 'params': {'address': 'example.com', 'username': 'user', 'key_path': '/path/to/key'}}
}
commands = [
("python script1.py", "job1"),
("python script2.py", "job2"),
# ... more commands
]
meinsweeper.run_sweep(commands, targets)
Node Types
- Local Async Node: Executes jobs on the local machine, managing GPU allocation.
- SSH Node: Connects to remote machines via SSH, manages GPU allocation, and executes jobs.
Both node types handle GPU checking, allocation, and release automatically.
Configuration
MeinSweeper can be configured using environment variables:
MINIMUM_VRAM: Minimum free VRAM required for a GPU to be considered available (in GB, default: 8)USAGE_CRITERION: Maximum GPU utilization for a GPU to be considered available (0-1, default: 0.8)MAX_PROCESSES: Maximum number of concurrent processes (-1 for no limit, default: -1)RUN_TIMEOUT: Timeout for each job execution (in seconds, default: 1200)MAX_RETRIES: Maximum number of retries for failed jobs (default: 3)MEINSWEEPER_RETRY_INTERVAL: Interval between retrying unavailable nodes (in seconds, default: 450)MEINSWEEPER_DEBUG: Enable debug logging (set to 'True' for verbose output)
Example:
export MINIMUM_VRAM=10
export USAGE_CRITERION=0.5
export MEINSWEEPER_RETRY_INTERVAL=300
python your_script.py
Advanced Usage
Custom Node Types
You can create custom node types by subclassing the ComputeNode abstract base class:
from meinsweeper.modules.nodes.abstract import ComputeNode
class MyCustomNode(ComputeNode):
async def open_connection(self):
# Implementation
async def run(self, command, label):
# Implementation
# Usage
targets = {
'custom_node': {'type': 'my_custom_node', 'params': {...}}
}
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file meinsweeper-0.3.5.tar.gz.
File metadata
- Download URL: meinsweeper-0.3.5.tar.gz
- Upload date:
- Size: 42.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0dbf06c0a114b4bbf77fc5643545008dc865e082bdda77e9783fba0c18302fce
|
|
| MD5 |
f84ca99c7cce28a64358e46fc20b21fb
|
|
| BLAKE2b-256 |
e554ac8cf0c86d4df76df63a2eac238015ba8a0a9a6f0f47aa3276a3a1cf1c5c
|
File details
Details for the file meinsweeper-0.3.5-py3-none-any.whl.
File metadata
- Download URL: meinsweeper-0.3.5-py3-none-any.whl
- Upload date:
- Size: 49.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
14bf2e74b15ca96a7515ec0858271fbedf9ee509ff812ccdee7b35c2ea78793b
|
|
| MD5 |
26e71fc680dc1022d4146165d5f47474
|
|
| BLAKE2b-256 |
8fe13a8ea709fe183ea7fb0ec905d1789732675191edf0fa3f44e20eab8699fd
|