Skip to main content

Utilities for expanding dask-jobqueue with appropriate settings for NCAR's clusters

Project description

ncar-jobqueue

ncar-jobqueue provides utilities for configuring dask-jobqueue with appropriate default settings for NCAR's clusters.

The following compute servers are supported:

  • Cheyenne (cheyenne.ucar.edu)
  • Casper (DAV) (casper.ucar.edu)
  • Hobart (hobart.cgd.ucar.edu)
  • Izumi (izumi.unified.ucar.edu)

Badges

CI GitHub Workflow Status GitHub Workflow Status Code Coverage Status
Package Conda PyPI
License License

Installation

NCAR-jobqueue can be installed from PyPI with pip:

python -m pip install ncar-jobqueue

NCAR-jobqueue is also available from conda-forge for conda installations:

conda install -c conda-forge ncar-jobqueue

Configuration

ncar-jobqueue provides a custom configuration file with appropriate default settings for different clusters. This configuration file resides in ~/.config/dask/ncar-jobqueue.yaml:

ncar-jobqueue.yaml
cheyenne:
  pbs:
    #project: XXXXXXXX
    name: dask-worker-cheyenne
    cores: 18 # Total number of cores per job
    memory: '109GB' # Total amount of memory per job
    processes: 18 # Number of Python processes per job
    interface: ib0 # Network interface to use like eth0 or ib0
    queue: regular
    walltime: '01:00:00'
    resource-spec: select=1:ncpus=36:mem=109GB
    log-directory: '/glade/scratch/${USER}/dask/cheyenne/logs'
    local-directory: '/glade/scratch/${USER}/dask/cheyenne/local-dir'
    job-extra: []
    env-extra: []
    death-timeout: 60

casper-dav:
  pbs:
    #project: XXXXXXXX
    name: dask-worker-casper-dav
    cores: 2 # Total number of cores per job
    memory: '25GB' # Total amount of memory per job
    processes: 1 # Number of Python processes per job
    interface: ib0
    walltime: '01:00:00'
    resource-spec: select=1:ncpus=1:mem=25GB
    queue: casper
    log-directory: '/glade/scratch/${USER}/dask/casper-dav/logs'
    local-directory: '/glade/scratch/${USER}/dask/casper-dav/local-dir'
    job-extra: []
    env-extra: []
    death-timeout: 60

hobart:
  pbs:
    name: dask-worker-hobart
    cores: 10 # Total number of cores per job
    memory: '96GB' # Total amount of memory per job
    processes: 10 # Number of Python processes per job
    # interface: null              # ib0 doesn't seem to be working on Hobart
    queue: medium
    walltime: '08:00:00'
    resource-spec: nodes=1:ppn=48
    log-directory: '/scratch/cluster/${USER}/dask/hobart/logs'
    local-directory: '/scratch/cluster/${USER}/dask/hobart/local-dir'
    job-extra: ['-r n']
    env-extra: []
    death-timeout: 60

izumi:
  pbs:
    name: dask-worker-izumi
    cores: 10 # Total number of cores per job
    memory: '96GB' # Total amount of memory per job
    processes: 10 # Number of Python processes per job
    # interface: null              # ib0 doesn't seem to be working on Hobart
    queue: medium
    walltime: '08:00:00'
    resource-spec: nodes=1:ppn=48
    log-directory: '/scratch/cluster/${USER}/dask/izumi/logs'
    local-directory: '/scratch/cluster/${USER}/dask/izumi/local-dir'
    job-extra: ['-r n']
    env-extra: []
    death-timeout: 60

Note:

  • To configure a default project account that is used by dask-jobqueue when submitting batch jobs, uncomment the project key/line in ~/.config/dask/ncar-jobqueue.yaml and set it to an appropriate value.

Usage

Note:

⚠️ Online documentation for dask-jobqueue is available here. ⚠️

Casper

>>> from ncar_jobqueue import NCARCluster
>>> from dask.distributed import Client
>>> cluster = NCARCluster(project='XXXXXXXX')
>>> cluster
PBSCluster(0f23b4bf, 'tcp://xx.xxx.x.x:xxxx', workers=0, threads=0, memory=0 B)
>>> cluster.scale(jobs=2)
>>> client = Client(cluster)

Cheyenne

>>> from ncar_jobqueue import NCARCluster
>>> from dask.distributed import Client
>>> cluster = NCARCluster(project='XXXXXXXX')
>>> cluster
PBSCluster(0f23b4bf, 'tcp://xx.xxx.x.x:xxxx', workers=0, threads=0, memory=0 B)
>>> cluster.scale(jobs=2)
>>> client = Client(cluster)

Hobart

>>> from ncar_jobqueue import NCARCluster
>>> from dask.distributed import Client
>>> cluster = NCARCluster(project='XXXXXXXX')
>>> cluster
PBSCluster(0f23b4bf, 'tcp://xx.xxx.x.x:xxxx', workers=0, threads=0, memory=0 B)
>>> cluster.scale(jobs=2)
>>> client = Client(cluster)

Izumi

>>> from ncar_jobqueue import NCARCluster
>>> from dask.distributed import Client
>>> cluster = NCARCluster(project='XXXXXXXX')
>>> cluster
PBSCluster(0f23b4bf, 'tcp://xx.xxx.x.x:xxxx', workers=0, threads=0, memory=0 B)
>>> cluster.scale(jobs=2)
>>> client = Client(cluster)

Non-NCAR machines

On non-NCAR machines, ncar-jobqueue will warn the user, and it will use distributed.LocalCluster:

>>> from ncar_jobqueue import NCARCluster
.../ncar_jobqueue/cluster.py:17: UserWarning: Unable to determine which NCAR cluster you are running on... Returning a `distributed.LocalCluster` class.
warn(message)
>>> from dask.distributed import Client
>>> cluster = NCARCluster()
>>> cluster
LocalCluster(3a7dd0f6, 'tcp://127.0.0.1:64184', workers=4, threads=8, memory=17.18 GB)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ncar-jobqueue-2021.4.14.tar.gz (16.8 kB view hashes)

Uploaded source

Built Distribution

ncar_jobqueue-2021.4.14-py3-none-any.whl (12.0 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page