How to get information about your jobs
Suppose that you have submitted a job and you would like some information about that job. There are a variety of of ways that you can access a wealth of job status information.
Task 1: List all jobs
Before getting information about jobs, it is critical to see what is actually running, or even what has previously run.
Show every running job along with the JOBID, user who submitted, etc
Code: squeue
Sample output:
Show a history of your jobs: completed, running, and failed
Code: sacct
Sample output:
The JOBID is particular useful. We will use it when we need to access information for a specific job. In this tutorial, whenever a command asks for <JOBID>, replace it with your actual JOBID as found with this command.
Graphically display all the jobs, with a bit of info
This command will only work if your SSH access is set up such that graphical applications my be displayed over the network.
Code: sview
Sample output:
Task 2: Display job information
Get detailed information about a running job with sstat
The sstat command gives a wealth of job status information. However, using it requires that a job be submitted with the 'srun' command. The following slurm script shows a C program that is being executed by mpirun, but also through srun. The C program would have run just fine without srun, but if we do not supply the srun portion of the command then sstat will not provide any information.
Sample slurm script:
Code:#!/bin/bash
#SBATCH --job-name=C_hello
#SBATCH --output=slurm_c.out
#SBATCH --error=slurm_c.err
#SBATCH --partition=normal
#SBATCH -N 1 #SBATCH -t 04:30:00
#SBATCH -n 4 ##SBATCH --cpus-per-task 4
srun mpirun ./hello
Now, do the following:
Step 1: run the script with sbatch to schedule the job:
Step 2: find the JOBID:Code: sbatch hello_c.slurm
Step 3: use that JOBID to get info with sstat:Code: squeue
Code: sstat <JOBID>Sample Run:
[ekrell@crest-login ekrell]$ sbatch hello_c.slurm
Submitted batch job 5198
[ekrell@hpcm ekrell]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
5107 cbirdq full cbird R 7-14:27:17 1 hpcc26
5166 cbirdq MaNib cbird R 1-01:04:41 1 hpcc25
5142 normal BiNib cbird R 3-03:52:47 1 hpcc01
5168 normal OaNib cbird R 1-00:56:18 1 hpcc02
5178 normal nullRows sterba R 1:55:49 9 hpcc[03-11]
5198 normal C_hello ekrell R 0:07 1 hpcc13
5185 serial test1.sh tmerrick R 58:35 1 hpcc12
[ekrell@hpcm ekrell]$ sstat 5198 JobID MaxVMSize MaxVMSizeNode MaxVMSizeTask AveVMSize MaxRSS MaxRSSNode MaxRSSTask AveRSS MaxPages MaxPagesNode MaxPagesTask AvePages MinCPU MinCPUNode MinCPUTask AveCPU NTasks AveCPUFreq ReqCPUFreq ConsumedEnergy MaxDiskRead MaxDiskReadNode MaxDiskReadTask AveDiskRead MaxDiskWrite MaxDiskWriteNode MaxDiskWriteTask AveDiskWrite
------------ ---------- -------------- -------------- ---------- ---------- ---------- ---------- ---------- -------- ------------ -------------- ---------- ---------- ---------- ---------- ---------- -------- ---------- ---------- -------------- ------------ --------------- --------------- ------------ ------------ ---------------- ---------------- ------------
sstat: WARNING: We will use a much slower algorithm with proctrack/pgid, use Proctracktype=proctrack/linuxproc or some other proctrack when using jobacct_gather/linux
5198.0 1171496K hpcc13 0 1171496K 46688K hpcc13 3 46678K 26K hpcc13 3 23K 00:00.000 hpcc13 0 00:00.000 4 2.80G 0 1M hpcc13 0 1M 0.13M hpcc13 0 0.13M
[ekrell@hpcm ekrell]$