Skip to main content

A more flexible hydra submitit launcher

Project description

hydra_submitit_extension

A more flexible Hydra Submitit Launcher plugin

This launcher is a extension of Submiti Launcher Plugin.

Features:

  • Iterative scheduling of jobs:
    • Max Number for Active and Pending Jobs
    • Automatically Schedule Jobs and Wait depending on Limits

Drawbacks:

  • Single Jobs are submitted instead of Job arrays
  • Hydra must be active to schedule jobs
  • No Naming Scheme for Slurm jobs. This makes manual canceling difficult

Installation

pip install hydra-submitit-extension

Quickstart

python main.py hydra/launcher=submitit_slurm_extended -m

All Parameter:

# @package hydra.launcher

# No changes to Submitit Launcher plugin
submitit_folder: ${hydra.sweep.dir}/.submitit/%j
timeout_min: 30
cpus_per_task: 1
gpus_per_node: null
tasks_per_node: 1
mem_gb: 1
nodes: 1
name: ${hydra.job.name}
partition: "dev_single"
qos: null
comment: null
constraint: null
exclude: null
#gres: "gpu:1"
cpus_per_gpu: null
gpus_per_task: null
mem_per_gpu: null
mem_per_cpu: null
account: null
signal_delay_s: 120
max_num_timeout: 0
additional_parameters: {}
array_parallelism: 256
setup: null

# Hydra Submitit Extension

_target_: hydra_plugins.hydra_submitit_extension.submitit_launcher.ExtendedSlurmLauncher

# Time between reschedule tries in s.
# Min is 60s
reschedule_interval: 60

# Maximum number of total active jobs in slurm account.
max_jobs_in_total: 5

# Maximum number of active jobs in current partition (e.g dev_single).
max_jobs_in_partition: 4

# Maximum number of active jobs in current sweep.
max_jobs_in_sweep: 3

Roadmap

  • Iterative Scheduling
  • Greedy Partition Selection

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hydra_submitit_extension-0.0.5.tar.gz (22.0 kB view hashes)

Uploaded Source

Built Distribution

hydra_submitit_extension-0.0.5-py3-none-any.whl (7.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page