Skip to main content

This is the MML LSF plugin, providing LSF cluster support on the specific setting of the DKFZ GPU cluster.

Project description

MML LSF plugin

This plugin provides LSF cluster support on the specific setting of the DKFZ GPU cluster.

Install

pip install mml-lsf

If you want to use the submission features of mml_lsf.runner.LSFJobRunner you need to set up the following (not required by any of the other features of this plugin):

  • install sshpass for providing ssh with password
    sudo apt install sshpass
    
  • set the following variables in your mml.env (alternatively provide them manually to mml_lsf.runner.LSFJobRunner)
    • export MML_AD_USER=...
    • export MML_CLUSTER_HOST=... (a LSF submission host)
    • export MML_CLUSTER_WORKER=... (a cluster worker node)

Usage

First and foremost it automatically ensures the number of workers used by MML to be conforming to the node the job will be executed (see mml_lsf.workers). In addition, it provides a suitable implementation for job planning on the LSF cluster, taking care of all necessary prefixes to the CLI (see mml_lsf.requirements). Finally, it offers the LSFJobRunner to automatically submit job. Alternatively it is also possible to submit via pre-rendering into a local file and ssh file tunneling.

Usage with `sshpass'

from mml.core.scripts.utils import load_env
from mml_lsf.requirements import LSFSubmissionRequirements
from mml_lsf.runner import LSFJobRunner
from mml.interactive.planning import MMLJobDescription

# make sure to load mml.env variables
load_env()  # if within a jupyter notebook, instead invoke mml.interactive.init()
# setup job requirements
reqs = LSFSubmissionRequirements(
    num_gpus=1, 
    vram_per_gpu=11.0, 
    queue='gpu-lowprio',
    mail='something@dkfz-heidelberg.de',  # optional
    script_name='mml.sh',  # name of my runner script to load CUDA, conda env, etc and finally invoke mml
    job_group='/USERNAME/JOB_GROUP_NAME',   # optional, used e.g. to limit max number of jobs 
    interactive=True  # optional, if True realtime updates are printed to terminal
    )
# setup runner, will prompt for password once
runner = LSFJobRunner() 
job = MMLJobDescription(prefix_req=reqs, mode='info', config_options={})  # simple job "mml info"
job.run(runner=runner)  # will submit job (no password prompt)
job2 = MMLJobDescription(prefix_req=reqs, mode='train', config_options={})  # another job "mml info"
job2.run(runner=runner) # will submit job (no password prompt)

Usage with file tunneling

from mml_lsf.requirements import LSFSubmissionRequirements
from mml.interactive.planning import MMLJobDescription, write_out_commands

# setup job requirements
reqs = LSFSubmissionRequirements(
    num_gpus=1, 
    vram_per_gpu=11.0, 
    queue='gpu-lowprio',
    mail='something@dkfz-heidelberg.de',  # optional
    script_name='mml.sh',  # name of my runner script to load CUDA, conda env, etc and finally invoke mml
    job_group='/USERNAME/JOB_GROUP_NAME',   # optional, used e.g. to limit max number of jobs 
    interactive=False  # setting True is not recommended for batched submission
    )

# create batch of cmds
cmds = list()
# cmd 1 some dummy task
prep_cmds.append(MMLJobDescription(prefix_req=reqs, mode='train', config_options={'tasks': 'fake', 'proj': 'dummy'}))
# cmd 2 another dummy task
prep_cmds.append(MMLJobDescription(prefix_req=reqs, mode='train', config_options={'tasks': 'fake', 'proj': 'dummy'}))
# now write
write_out_commands(cmd_list=cmds, name='exp1')
# this creates a 'exp1.txt' at current working directory

Now submit these jobs via:

ssh AD_USER@SUBMISSION_HOST 'bash -s' < /path/to/exp1.txt

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mml_lsf-0.5.1.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mml_lsf-0.5.1-py3-none-any.whl (11.4 kB view details)

Uploaded Python 3

File details

Details for the file mml_lsf-0.5.1.tar.gz.

File metadata

  • Download URL: mml_lsf-0.5.1.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for mml_lsf-0.5.1.tar.gz
Algorithm Hash digest
SHA256 2a723787799e1e781bf10a7bb3a7acf3dfadaf5b46e546e672993c09f6e28566
MD5 3ddce45f17649f372a3f4853dd0430b9
BLAKE2b-256 00e5d9c71ce2c211009f8e08ceaa13549785f059f7774117e4a5eb0880666dd6

See more details on using hashes here.

Provenance

The following attestation bundles were made for mml_lsf-0.5.1.tar.gz:

Publisher: publish.yml on IMSY-DKFZ/mml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mml_lsf-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: mml_lsf-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 11.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for mml_lsf-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d5ef60e5799a1207282ef0f3e1832be5dfa341f633e1bb14b3d79c6d24055d36
MD5 4cab72ce7cba89055798263b6274c7dc
BLAKE2b-256 b0aedd12c2941555250bcdb94def4d4b997df772248a93a4b5dcce6622c00a14

See more details on using hashes here.

Provenance

The following attestation bundles were made for mml_lsf-0.5.1-py3-none-any.whl:

Publisher: publish.yml on IMSY-DKFZ/mml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page