PNGC Slurm Cluster Useful Slurm Commands

Welcome PNGC Slurm Cluster.

To view Cluster Resource details:

$ sinfo //show default cluster available resource
$ mysinfo //show more cluster resource details
$ mysinfo -n <node-hostname> //show more cluster resource details for the computing node
$ squeue //show default cluster queue
$ mysqueue //show more cluster queue details
$ mysqueue -u <user-id> //show more cluster queue details for user
$ mysqueue -w <node-hostname> //show more cluster queue details for the computing node
$ scontrol show nodes //show all computing nodes’ details
$ scontrol show node <node-hostname> //show computing node’ details
$ jobinfo <job-id> //show running job details
$ seff <job-id> //show finished job information
$ srun --partition=<your-q> --pty bash //to use Interactive jobs
$ srun --partition=<your-q> -c ## --pty bash //to use ## of CPU for Interactive jobs

Please send your issues & questions to help@penn-ngc.org

Scientific Applications can be found in /applications. In each application’s directory, you will find the installed version of a specific piece of software.

Resource intensive scientific commands should not be run on the cluster head node. To run a resource intensive command interactively on a cluster node, see below.

Each user can use a maximum of 48 cores according to your QoS at one time.

Useful Slurm Commands:

Srun

This can be used to run a command interactively on one of the cluster nodes.
To directly run a script or program using srun:
[username@pngc-head01 ~]$ srun /home/username/slurm_tests/slurm_test3.sh

To be allocated a shell on a cluster node to run resource intensive programs interactively:
[username@pngc-head01 ~]$ srun --pty bash

The -c flag can be used to allocate more than 1 CPU to a job. This is allocating 4 CPUs to the slurm_test3 script:
[username@pngc-head01 ~]$ srun –c 4 /home/username/slurm_tests/slurm_test3.sh

Such as:
# srun --partition=najq --qos=normal-naj --pty bash //to use 1 CPUs for the interactive job
# srun --partition=najq --qos=normal-naj -c 2 --pty bash //to use 2 CPUs for the interactive job

sbatch

This can be used to submit a large batch of jobs at once. Slurm supports many variables to insert into your batch scripts to control how jobs are run. These are documented in the sbatch man page
[cschrader@pngc-head01 slurm_tests]$ sbatch slurm_test3.shSubmitted batch job 315516.

squeue

This command is used to monitor the queue of running and pending jobs. The fifth column (ST) indicates the state of the job. Most frequently, you will see PD (pending) and R (running)

scancel

This is used to cancel a running job. You will need the jobid for this.

Further Resources:

Slurm Quick Reference Chart:
https://slurm.schedmd.com/pdfs/summary.pdf

Slurm Quickstart:
https://slurm.schedmd.com/quickstart.html

Slurm documentation page:
https://slurm.schedmd.com

Gridengine to Slurm Conversion:
https://srcc.stanford.edu/sge-slurm-conversion

Gridengine to Slurm command conversion chart:
https://slurm.schedmd.com/rosetta.pdf

Slurm-devel Mailing List:
https://groups.google.com/forum/#!forum/slurm-devel

Sample bash Job

Please don’t run your bash jobs in head node.
Generally run interactive job for your bash jobs.
To run an interactive job into an auto assigned compute node.
# srun --pty bashTo run an interactive job into a specific compute node
# srun --nodelist=compute## --pty bashTo run an interactive job into an auto assigned gpu compute node
# srun --partition=pngcq --pty bashThen go to your data/software.
# cd /sbuckets/project-bucket/path/data # cd /sbuckets/project-bucket/path/sftware

The interactive job means you can run any script which you want to run in head node.
Simply say at point, you can use a compute node as head node, but no srun, sbatch.
So
# srun --pty bashInto a compute node
<compute node> # cd /project-share/ <compute node> # vi script <compute node> # R <compute node> # R script /path/raw-data /path/result-data <compute node> # conda <compute node> # conda script /path/raw-data /path/result-data

Node: Your interactive jobs only allow for 14 days (2 weeks), please exit to free resource. However, you can create new interactive jobs.

Slurm Script example:
#!/bin/bash # #SBATCH --job-name=adsp-cadre-ch01-download #SBATCH -o output/out-adsp-cadre-ch01-download-%j.txt #SBATCH -e output/error-adsp-cadre-ch01-download-%j.txt #SBATCH --ntasks=4 # #SBATCH --partition=cadreq #SBATCH --nodelist=cadre-node01 ##SBATCH --nodelist=pngc-node01 #SBATCH --qos=normal

Parallel job to use 12 CPUs
#SBATCH --ntasks=3 #SBATCH --cpus-per-task=4 #SBATCH --mem-per-cpu=3g

# Parallel job to use total 36 CPUs from 3 nodes
#SBATCH --nodes=3 #SBATCH --ntasks-per-node=3 #SBATCH --cpus-per-task=4 #SBATCH --mem-per-cpu=3G #SBATCH --array=1-20%10Using sbatch you can specify for example –array=1-20%10 which sets JobArrayTaskLimit to 10 so that no more than 10 jobs can run at once.

To run a slurm job:

Save below script as testA.sh.

Example A:

#!/bin/bash # #SBATCH --job-name=cadre-node01-test #SBATCH -o out-cadre-node01-test.txt #SBATCH -e error-cadre-node01-test.txt #SBATCH --ntasks=1 #SBATCH --time=01:00 #SBATCH -N 1 #SBATCH --partition=cadreq #SBATCH --nodelist=cadre-node01 #SBATCH --qos=normal #SBATCH --mem=1g # # srun hostname srun uname -a srun sleep 60

Then run it in prompt as slurm job
# sbatch tesh,sh

To view your testA job
# squeue

Example B:

#!/bin/bash # #SBATCH --job-name=adsp-cadre-ch01-download #SBATCH -o out-adsp-cadre-ch01-download-%j.txt #SBATCH -e error-adsp-cadre-ch01-download-%j.txt #SBATCH --ntasks=4 # #SBATCH --partition=cadreq #SBATCH --nodelist=cadre-node01 #SBATCH --qos=normal # aws s3 cp s3://source-bucket-name/path/to/file-name s3://cadre-psom-s3-bucket-01/public-rawdata/file-name

Then run it in prompt as slurm job
# sbatch teshB,sh

To view your testB job
# squeue