A command called sbatch and foo for the cloudmesh shell
Project description
Documentation
Sample YAML File
This command requires a YAML file which is configured for the host and gpu. The YAML file also points to the desired slurm template.
slurm_template: 'slurm_template.slurm'
sbatch_setup:
<hostname>-<gpu>:
- card_name: "a100"
- time: "05:00:00"
- num_cpus: 6
- num_gpus: 1
rivanna-v100:
- card_name: "v100"
- time: "06:00:00"
- num_cpus: 6
- num_gpus: 1
example:
cms sbatch slurm.in.sh --config=a.py,b.json,c.yaml --attributes=a=1,b=4 --noos --dir=example --experiment=\"epoch=[1-3] x=[1,4] y=[10,11]\"
sbatch slurm.in.sh --config=a.py,b.json,c.yaml --attributes=a=1,b=4 --noos --dir=example --experiment="epoch=[1-3] x=[1,4] y=[10,11]"
# ERROR: Importing python not yet implemented
epoch=1 x=1 y=10 sbatch example/slurm.sh
epoch=1 x=1 y=11 sbatch example/slurm.sh
epoch=1 x=4 y=10 sbatch example/slurm.sh
epoch=1 x=4 y=11 sbatch example/slurm.sh
epoch=2 x=1 y=10 sbatch example/slurm.sh
epoch=2 x=1 y=11 sbatch example/slurm.sh
epoch=2 x=4 y=10 sbatch example/slurm.sh
epoch=2 x=4 y=11 sbatch example/slurm.sh
epoch=3 x=1 y=10 sbatch example/slurm.sh
epoch=3 x=1 y=11 sbatch example/slurm.sh
epoch=3 x=4 y=10 sbatch example/slurm.sh
epoch=3 x=4 y=11 sbatch example/slurm.sh
Timer: 0.0022s Load: 0.0013s sbatch slurm.in.sh --config=a.py,b.json,c.yaml --attributes=a=1,b=4 --noos --dir=example --experiment="epoch=[1-3] x=[1,4] y=[10,11]"
Slurm on a single computer ubuntu 20.04
Install
32 Processors (threads)
sudo apt update -y
sudo apt install slurmd slurmctld -y
sudo chmod 777 /etc/slurm-llnl
# make sure to use the HOSTNAME
sudo cat << EOF > /etc/slurm-llnl/slurm.conf
# slurm.conf file generated by configurator.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ClusterName=localcluster
SlurmctldHost=$HOSTNAME
MpiDefault=none
ProctrackType=proctrack/linuxproc
ReturnToService=2
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
SlurmUser=slurm
StateSaveLocation=/var/lib/slurm-llnl/slurmctld
SwitchType=switch/none
TaskPlugin=task/none
#
# TIMERS
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
# SCHEDULING
SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core
#
#AccountingStoragePort=
AccountingStorageType=accounting_storage/none
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
#
# COMPUTE NODES # THis machine has 128GB main memory
NodeName=$HOSTNAME CPUs=32 RealMemory==128762 State=UNKNOWN
PartitionName=local Nodes=ALL Default=YES MaxTime=INFINITE State=UP
EOF
sudo chmod 755 /etc/slurm-llnl/
Start
sudo systemctl start slurmctld
sudo systemctl start slurmd
sudo scontrol update nodename=localhost state=idle
Stop
sudo systemctl stop slurmd
sudo systemctl stop slurmctld
Info
sinfo
Job
save into gregor.slurm
#!/bin/bash
#SBATCH --job-name=gregors_test # Job name
#SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=laszewski@gmail.com # Where to send mail
#SBATCH --ntasks=1 # Run on a single CPU
#### XBATCH --mem=1gb # Job memory request
#SBATCH --time=00:05:00 # Time limit hrs:min:sec
#SBATCH --output=sgregors_test_%j.log # Standard output and error log
pwd; hostname; date
echo "Gregors Test"
date
sleep(30)
date
Run with
sbatch gregor.slurm
watch -n 1 squeue
BUG
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2 LocalQ gregors_ green PD 0:00 1 (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions)
sbatch slurm manageement commands for localhost
start slurm deamons
cms sbatch slurm start
stop surm deamons
cms sbatch slurm stop
BUG:
srun gregor.slurm
srun: Required node not available (down, drained or reserved)
srun: job 7 queued and waiting for resources
sudo scontrol update nodename=localhost state=POWER_UP
Valid states are: NoResp DRAIN FAIL FUTURE RESUME POWER_DOWN POWER_UP UNDRAIN
Cheatsheet
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cloudmesh-sbatch-4.3.3.tar.gz
(11.5 kB
view details)
Built Distribution
File details
Details for the file cloudmesh-sbatch-4.3.3.tar.gz
.
File metadata
- Download URL: cloudmesh-sbatch-4.3.3.tar.gz
- Upload date:
- Size: 11.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/33.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11d3a11a580cd3315d4b00a87c392472a8905d1a9beb25000f3b457ec40ec23e |
|
MD5 | 7dac5e0b3bf0f9a6c749df3bacf536c0 |
|
BLAKE2b-256 | 88e5b122ac810b25039a19853e3f31908f9c543f27c1a766cb7a6d9890b6f324 |
File details
Details for the file cloudmesh_sbatch-4.3.3-py2.py3-none-any.whl
.
File metadata
- Download URL: cloudmesh_sbatch-4.3.3-py2.py3-none-any.whl
- Upload date:
- Size: 10.5 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/33.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 91cf14c3c7c129f12d8cac25d95cacdc437d7919ed6121b57a9b0337f2f8f658 |
|
MD5 | 9753a0000a47de5e4a3156812f6c387c |
|
BLAKE2b-256 | 8a96b51e14698ddef15d2ffdadb3f3dc12a5e2889ed1823335a55b89e4e4c5b8 |