Skip to main content

No project description provided

Project description

DTU HPC

DTU HPC is a collection of scripts and tools for running jobs on the DTU HPC cluster. It should help you to get started with running jobs on the cluster, and to make your life easier.

Installation

To install just run:

pip install dtuhpc

Getting started

To get started you first need to run:

dtuhpc auth

It will ask you for your username and password for DTU, and it will then ask you for an encryption password. This password is used to encrypt your DTU password, so that it can be stored on your computer. You will need to remember this password, as it is used to decrypt your password when you run commands.

Afterwards, you should create a configuration file for your project. This should be named .dtuhpc.toml and should be placed in the root of your project. You can use the following template:

[ssh]
user = "<username>"
host = "<login-host>"
default_cwd = "<default working directory>"
key_filename = "<path to ssh key>"

[github]
access_token = "<github access token>"

[project]
name = "<project name>"
path = "<path to project on cluster>"
default_deploy_branch = "master"

The ssh section is used to configure the ssh connection to the cluster. The GitHub access token can be generated from the following page.

Setup project

To set up a project, you can run:

dtuhpc init [--poetry] [--custom-job=<path to job script>]

This will dispatch a job to the cluster, which will clone your project, create a virtual environment, and install the dependencies. You can choose to use either poetry, pip, or a custom job script. How to define jobs will be explained in the next section.

Writing jobs

Jobs are defined as toml files. It contains numerous options:

name = "<name of job>"
queue = "<queue name>"
single_host = <true/false>
walltime = { hours = <hours>, minutes = <minutes> }
standard_output = "<path to standard output file>"
error_output = "<path to error output file>"
memory = <memory to allocate>
memory_kill_limit = <memory kill limit>
cores = <number of cores to allocate>
email = "<email address>"
notification_start = <true/false>
notification_end = <true/false>
core_block_size = <core block size>
core_p_tile_size = <core p tile size>
use_gpu = { num_of_gpus = <number of gpus>, per_task = <true/false> }

commands = [
    "<bash command 1>",
    "<bash command 2>",
    ...
]

An example of a script can be seen here:

queue = "hpc"
name = "init_${{ project_name }}"
walltime = { hours = 0, minutes = 15 }
single_host = true
cpu = 2
memory = 4
standard_output = "init_${{ project_name }}.out"
error_output = "init_${{ project_name }}.err"

commands = [
    "git clone ${{ git_url }} ${{ project_path }}",
    "module load python3/3.10.7",
    "cd ${{ project_path }}",
    "python3 -m venv ${{ project_path }}/venv",
    "source ${{ project_path }}/venv/bin/activate",
    "pip3 install 'poetry==1.3.2'",
    "poetry install",
]

In this script, we can see that we can use variables in the script. These variables are some default ones that are only available for the init job.

Deploying jobs

To deploy a job you just run the following command:

dtuhpc deploy <job_path>

It will then ask you to pick from branches or PR's. It will then dispatch the job to the cluster.

Other commands

Some other commands:

Exec commands on cluster

To execute commands on the cluster, you can run:

dtuhpc exec '<command to run>'

It will run in the default working directory, which is defined in the configuration file.

SSH into cluster

To ssh into the cluster, you can run:

dtuhpc ssh

It will then open an ssh connection to the cluster. From here you can run commands as you would normally.

Predefined subcommands

There are also some predefined subcommands, which are just wrappers around the cluster commands. They are all prefixed by dtuhpc c <command_name>. To get the full documentation for the commands, you can run:

dtuhpc c <command_name> --help
bkill

Kill a job on the cluster.

dtuhpc c bkill <job_id>
bqueues

List all queues on the cluster.

dtuhpc c bqueues
bstat

Get the status of a job on the cluster.

dtuhpc c bstat <optional job_id>
bsub

Submit a job to the cluster.

dtuhpc c bsub <path to job script>
nodestat

Get the status of the nodes on the cluster.

dtuhpc c nodestat
showstart

Show the start time of a job on the cluster.

dtuhpc c showstart <job_id>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dtuhpc-2.1.3.tar.gz (17.1 kB view details)

Uploaded Source

Built Distribution

dtuhpc-2.1.3-py3-none-any.whl (29.9 kB view details)

Uploaded Python 3

File details

Details for the file dtuhpc-2.1.3.tar.gz.

File metadata

  • Download URL: dtuhpc-2.1.3.tar.gz
  • Upload date:
  • Size: 17.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for dtuhpc-2.1.3.tar.gz
Algorithm Hash digest
SHA256 22ea89e1c4853ba1abd0857a0ab63830f5aafa6e9028da2fc1e946932b2b702f
MD5 fbda5cc893c4fd5d424097690d1e667c
BLAKE2b-256 2d0b1533df039f8529d2ff1bbbc0d916473b7b0407ebb7e50b67302061276822

See more details on using hashes here.

File details

Details for the file dtuhpc-2.1.3-py3-none-any.whl.

File metadata

  • Download URL: dtuhpc-2.1.3-py3-none-any.whl
  • Upload date:
  • Size: 29.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for dtuhpc-2.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 37aa7e0b73e10bf119fb33c9cd1f98430f3baf241875bafbdb30da4208460a03
MD5 ea53bd7495dc24cba6940b3a198f8400
BLAKE2b-256 a71ca4caa20d6ba17c2dfc73f2ed421bbd2f7e60fbc118b925831c1dff859cbd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page