This is the MML LSF plugin, providing LSF cluster support on the specific setting of the DKFZ GPU cluster.
Project description
MML LSF plugin
This plugin provides LSF cluster support on the specific setting of the DKFZ GPU cluster.
Install
pip install mml-lsf
If you want to use the submission features of mml_lsf.runner.LSFJobRunner you need to set up the following
(not required by any of the other features of this plugin):
- install sshpass for providing ssh with password
sudo apt install sshpass
- set the following variables in your mml.env (alternatively provide them manually to
mml_lsf.runner.LSFJobRunner)export MML_AD_USER=...export MML_CLUSTER_HOST=...(a LSF submission host)export MML_CLUSTER_WORKER=...(a cluster worker node)
Usage
First and foremost it automatically ensures the number of workers used by MML to be conforming to
the node the job will be executed (see mml_lsf.workers). In addition, it provides a suitable implementation
for job planning on the LSF cluster, taking care of all necessary prefixes to the CLI (see
mml_lsf.requirements). Finally, it offers the LSFJobRunner to automatically submit job. Alternatively it is also
possible to submit via pre-rendering into a local file and ssh file tunneling.
Usage with `sshpass'
from mml.core.scripts.utils import load_env
from mml_lsf.requirements import LSFSubmissionRequirements
from mml_lsf.runner import LSFJobRunner
from mml.interactive.planning import MMLJobDescription
# make sure to load mml.env variables
load_env() # if within a jupyter notebook, instead invoke mml.interactive.init()
# setup job requirements
reqs = LSFSubmissionRequirements(
num_gpus=1,
vram_per_gpu=11.0,
queue='gpu-lowprio',
mail='something@dkfz-heidelberg.de', # optional
script_name='mml.sh', # name of my runner script to load CUDA, conda env, etc and finally invoke mml
job_group='/USERNAME/JOB_GROUP_NAME', # optional, used e.g. to limit max number of jobs
interactive=True # optional, if True realtime updates are printed to terminal
)
# setup runner, will prompt for password once
runner = LSFJobRunner()
job = MMLJobDescription(prefix_req=reqs, mode='info', config_options={}) # simple job "mml info"
job.run(runner=runner) # will submit job (no password prompt)
job2 = MMLJobDescription(prefix_req=reqs, mode='train', config_options={}) # another job "mml info"
job2.run(runner=runner) # will submit job (no password prompt)
Usage with file tunneling
from mml_lsf.requirements import LSFSubmissionRequirements
from mml.interactive.planning import MMLJobDescription, write_out_commands
# setup job requirements
reqs = LSFSubmissionRequirements(
num_gpus=1,
vram_per_gpu=11.0,
queue='gpu-lowprio',
mail='something@dkfz-heidelberg.de', # optional
script_name='mml.sh', # name of my runner script to load CUDA, conda env, etc and finally invoke mml
job_group='/USERNAME/JOB_GROUP_NAME', # optional, used e.g. to limit max number of jobs
interactive=False # setting True is not recommended for batched submission
)
# create batch of cmds
cmds = list()
# cmd 1 some dummy task
prep_cmds.append(MMLJobDescription(prefix_req=reqs, mode='train', config_options={'tasks': 'fake', 'proj': 'dummy'}))
# cmd 2 another dummy task
prep_cmds.append(MMLJobDescription(prefix_req=reqs, mode='train', config_options={'tasks': 'fake', 'proj': 'dummy'}))
# now write
write_out_commands(cmd_list=cmds, name='exp1')
# this creates a 'exp1.txt' at current working directory
Now submit these jobs via:
ssh AD_USER@SUBMISSION_HOST 'bash -s' < /path/to/exp1.txt
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mml_lsf-0.5.1.tar.gz.
File metadata
- Download URL: mml_lsf-0.5.1.tar.gz
- Upload date:
- Size: 11.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2a723787799e1e781bf10a7bb3a7acf3dfadaf5b46e546e672993c09f6e28566
|
|
| MD5 |
3ddce45f17649f372a3f4853dd0430b9
|
|
| BLAKE2b-256 |
00e5d9c71ce2c211009f8e08ceaa13549785f059f7774117e4a5eb0880666dd6
|
Provenance
The following attestation bundles were made for mml_lsf-0.5.1.tar.gz:
Publisher:
publish.yml on IMSY-DKFZ/mml
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mml_lsf-0.5.1.tar.gz -
Subject digest:
2a723787799e1e781bf10a7bb3a7acf3dfadaf5b46e546e672993c09f6e28566 - Sigstore transparency entry: 158552092
- Sigstore integration time:
-
Permalink:
IMSY-DKFZ/mml@30ab567b6df433e09cd24eae79d3c78a43a287a0 -
Branch / Tag:
refs/tags/1.0.0 - Owner: https://github.com/IMSY-DKFZ
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@30ab567b6df433e09cd24eae79d3c78a43a287a0 -
Trigger Event:
push
-
Statement type:
File details
Details for the file mml_lsf-0.5.1-py3-none-any.whl.
File metadata
- Download URL: mml_lsf-0.5.1-py3-none-any.whl
- Upload date:
- Size: 11.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d5ef60e5799a1207282ef0f3e1832be5dfa341f633e1bb14b3d79c6d24055d36
|
|
| MD5 |
4cab72ce7cba89055798263b6274c7dc
|
|
| BLAKE2b-256 |
b0aedd12c2941555250bcdb94def4d4b997df772248a93a4b5dcce6622c00a14
|
Provenance
The following attestation bundles were made for mml_lsf-0.5.1-py3-none-any.whl:
Publisher:
publish.yml on IMSY-DKFZ/mml
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mml_lsf-0.5.1-py3-none-any.whl -
Subject digest:
d5ef60e5799a1207282ef0f3e1832be5dfa341f633e1bb14b3d79c6d24055d36 - Sigstore transparency entry: 158552093
- Sigstore integration time:
-
Permalink:
IMSY-DKFZ/mml@30ab567b6df433e09cd24eae79d3c78a43a287a0 -
Branch / Tag:
refs/tags/1.0.0 - Owner: https://github.com/IMSY-DKFZ
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@30ab567b6df433e09cd24eae79d3c78a43a287a0 -
Trigger Event:
push
-
Statement type: