CLI to make it easier to work with the HPC cluster at the Technical University of Denmark (DTU).
Project description
DTU HPC CLI
CLI for working with the High Performance Cluster (HPC) at the Technical University of Denmark (DTU). This CLI is a wrapper around the tools provided by the HPC to make it easier to run and manage jobs. See the HPC documentation for more information.
Requirements
- Python v3.10+
- git v1.7.0+
- rsync
git is required because we assume that you use git for branching. The CLI can use this to get your active branch, so your submitted jobs will run from that branch.
rsync is needed for synchronizing your local code to the HPC.
Installation
The CLI can be installed using pip:
pip install dtu-hpc-cli
You will also need to create a configuration in your project. See Configuration.
Usage
You can call it using the dtu
command, which has the these subcommands:
- get-command: Get the command used to submit a previous job.
- get-options: Print options from a previously submitted job.
- history: Shows a list of the jobs that you have submitted and the options/commands that you used.
- install: Calls the installation commands in your configuration. NB. this command will install your project on the HPC - not on your local machine.
- list: Shows a list of running and pending jobs. It calls
bstat
on the HPC. - queues: List all queues or show job statistics for a single queue. It calls
bqueues
orclasstat
on the HPC. - remove: Removes (kills) one or more running or pending jobs. It calls
bkill
on the HPC. - resubmit: Submits a job with the same options/commands as a previous job. Each option/command can optionally be overriden.
- run: Run one or more commands on the HPC. Uses the configured remote path as the working directory.
- stats: Shows stats about a queue. It calls
nodestat
on the HPC. - submit: Submits a job to the HPC. Calls
bsub
on the HPC. NB. This command will automatically split a job into multiple jobs that run after each other when the walltime exceeds 24 hours. This is done because HPC limits GPU jobs to this duration. You can use the--split-every
option to change duration at which jobs should be split. - sync: Synchronizes your local project with the project on the HPC. Requires that you have the
rsync
command. NB. It ignores everything in.gitignore
.
All commands will work out of the box on the HPC (except for sync
). However, a big advantage of this tool is that you can call it from your local machine as well. You will need to configure SSH for this to work.
Example
A typical workflow will look like this:
-
Synchronize your local project with the HPC.
> dtu sync ⠹ Syncing Finished synchronizing
-
Install the project on the HPC. (Install commands are
["poetry install --sync"]
in this example.)> dtu install ⠇ Installing Finished installation. Here are the outputs: > poetry install --sync Installing dependencies from lock file Package operations: 0 installs, 0 updates, 1 removal - Removing setuptools (69.5.1)
-
Submit a job. Use
dtu submit --help
to see all available options.> dtu submit --name test --cores 2 --memory 2gb --walltime 1h "echo foo" "echo bar" Job script: #!/bin/sh ### General options #BSUB -J test #BSUB -q hpc #BSUB -n 2 #BSUB -R rusage[mem=2GB] #BSUB -R span[hosts=1] #BSUB -W 01:00 # -- end of LSF options -- # Commands git switch main && echo foo git switch main && echo bar Submit job (enter to submit)? [Y/n]: y Submitting job... Submitted job <22862148>
-
Check that job is queued.
> dtu list JOBID USER QUEUE JOB_NAME NALLOC STAT START_TIME ELAPSED 22862150 [user] hpc test 0 PEND - 0:00:00
Configuration
You will need to configure the CLI for each project, such that it knows what to install and how to connect to the HPC. You do this by creating .dtu_hpc.json
in the root of your project. (We suggest that you add this file to .gitignore since the SSH configuration is specific to each user.)
All options in the configuration are optional, which means it can be as simple as this:
{}
However, we highly recommend to at least configure SSH to be able to manage jobs from your local machine.
See all options in the complete example at the end.
SSH
The SSH configuration requires that you at least add a user and identityfile. You may also optionally specify a hostname - it defaults to login1.hpc.dtu.dk when omitted.
{
"ssh": {
"user": "your_dtu_username",
"identityfile": "/your/local/path/to/private/key"
}
}
Modules
Your code may need to load specific modules to work. You can specify these modules here and they will automatically be loaded when using install
and submit
.
{
"modules": [
"python3/3.11.8"
]
}
Install
The install
command requires that you provide a set of commands to run. These are provided in commands using the install option. You may optionally specify sync to either false or true. This determines whether to automatically synchronize your project before running the install commands and default to true.
{
"install": {
"commands": [
"pip install -r requirements.txt"
]
}
}
History
The history of job submissions defaults to be saved to .dtu_hpc_history.json in the root of your project. You can override this location using history_path:
{
"history_path": "path/to/history.json"
}
Remote Location
The tool needs to know the location of your project on the HPC. The location defaults to ~/[name]-[hash] where [name] is the project directory name on your local machine and [hash] is generated based on the path to [name] on your local machine. You can override this using remote_path:
{
"remote_path": "path/to/project/on/hpc"
}
Submit
The submit command has many options and you may want to provide sensible defaults for your specific application. Call dtu submit --help
to see the existing defaults.
Any of the options can be given a custom default. As such, both of the options below are valid configurations for submit.
Only override a single option to use the V100 GPU queue as the default queue:
{
"submit": {
"queue": "gpuv100"
}
}
Provide your own default settings for any of the submit options:
{
"submit": {
"branch": "main",
"commands": [
"python my_script.py"
],
"cores": 4,
"feature": [
"gpu32gb"
],
"error": "path/to/error_dir",
"gpus": 1,
"hosts": 1,
"memory": "5GB",
"model": "XeonGold6230",
"name": "my_job",
"output": "path/to/output_dir",
"preamble": [],
"queue": "hpc",
"split_every": "1d",
"start_after": "12345678",
"sync": true,
"walltime": "1d"
}
}
NB. error and output are directory locations on the HPC. The file path will be [directory]/[name]_[jobId].out
for output and [directory]/[name]_[jobId].err
for error.
NB. branch defaults to the special value [[active_branch]]
. This means that it will use the currently active branch.
Profiles
Use profiles to easily change between different configurations in the same project. For example, you may want to use different ressources for a CPU job and a GPU job. This can be accomplished by defining two profiles as below and submitting as dtu --profile cpu submit
and dtu --profile gpu submit
. Profiles can override any setting and can be used for any command.
{
"profiles": {
"cpu": {
"queue": "hpc",
"cores": 4,
"memory": "5GB"
},
"gpu": {
"queue": "gpuv100",
"cores": 8,
"memory": "10GB"
}
}
}
Complete Configuration
Here is a complete example for a configuration that customizes everything:
{
"history_path": "path/to/history.json",
"install": {
"commands": [
"pip install -r requirements.txt"
],
"sync": true,
},
"modules": [
"python3/3.11.8"
],
"remote_path": "path/to/project/on/hpc",
"ssh": {
"user": "your_dtu_username",
"identityfile": "/your/local/path/to/private/key",
"hostname": "login1.hpc.dtu.dk"
},
"submit": {
"branch": "main",
"commands": [
"python my_script.py"
],
"cores": 4,
"feature": [
"gpu32gb"
],
"error": "path/to/error_dir",
"gpus": 1,
"hosts": 1,
"memory": "5GB",
"model": "XeonGold6230",
"name": "my_job",
"output": "path/to/output_dir_",
"preamble": [],
"queue": "hpc",
"split_every": "1d",
"start_after": "12345678",
"sync": true,
"walltime": "1d"
},
"profiles": {
"cpu": {
"queue": "hpc",
"cores": 4,
"memory": "5GB"
},
"gpu": {
"queue": "gpuv100",
"cores": 8,
"memory": "10GB"
}
}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dtu_hpc_cli-1.3.0.tar.gz
.
File metadata
- Download URL: dtu_hpc_cli-1.3.0.tar.gz
- Upload date:
- Size: 21.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.8 Linux/5.15.0-134-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f53acf18468e63981081e95869a7b1d611c01dcd9e4775592263f044b8a100e0 |
|
MD5 | 7d1ee3857d06e2aa9bee869f0450c2b9 |
|
BLAKE2b-256 | 9b229046d6be096bc7cde9f3ad33bce4b822351757b19a3459062e72c6c57890 |
File details
Details for the file dtu_hpc_cli-1.3.0-py3-none-any.whl
.
File metadata
- Download URL: dtu_hpc_cli-1.3.0-py3-none-any.whl
- Upload date:
- Size: 24.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.8 Linux/5.15.0-134-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d8cac2755ffa88870f07551ccf6521a07254d38a2e66372a2e7394c8b4a9417 |
|
MD5 | adef781e405ad9e095a68a1bb8e28097 |
|
BLAKE2b-256 | 3a4114411d801ee82b5914889f5473dd4b56bebe82450c8cd89cd057b5646e2f |