Estimating job resources
When trying to estimate how many resources to submit for you can use the interactive srun command.
In a terminal connected to a login node, or in a new terminal launch a 5 core interactive job run via:
srun --cpus-per-task=5 --pty /bin/bash
conda activate discovery_class
To get started, please copy a small zip folder containing some python code and a sample submit script.
cp /dartfs-hpc/admin/Class_Examples.zip . && unzip Class_Examples.zip
/dartfs-hpc/admin
. The .
instructs the copy to your current working directory. The &&
instructs to run the next command, which is to unzip the contents of the folder into the directory you are in.
When estimating your resource utilization you can use a program like top
to monitor current utilization.
In this tutorial lets open two terminals side by side.
In one terminal we will launch our python code from the folder we unzipped. In the other terminal we will run the command top -u <username>
to look at resource utilization.
In your first terminal take notice of the host your srun job landed on. You can see this by the change in the prompt:
[john@t04 ~]$
In my second terminal i'll ssh to t04 directly.
ssh t04
Now that we are setup with two terminals side by side on the same host. (one terminal a job, the other terminal direct ssh). Next, within the Class_Examples
folder is a basic python script we will use for estimating resources. The script is called invert_matrix.py
. Lets run the script from our first terminal, the one we created the interactive job in, and see what it does.
cd Class_Examples
time python3 invert_matrix.py
Once your python command is executing like the above image, use the second terminal you opened to run the top command. My username is john
so my command will be:
top -u $(whoami)
You can get out of top by simply hitting the letter q
.
The next top screen will display information about the state of the system but also information like the number of CPUs, amount of system memory, and other useful information. In this case we are looking at two fields in particular. CPU%
& RES
short for reserved memory.
From the above the CPU%
column is showing 97-99%
. That is equivalent to 1 CPUs. With this information I know to submit my job for 1 CPU in order for it to run efficiently.
In the other col RES
we can see that we are not using quite a full GB of memory. We know from this output that requesting the minimum for a job of 8GB
will be sufficient for our job. (or lower)
The next resource you should consider estimating before subming your job is walltime
walltime is used to determine how long your job will run for. Estimating accurate walltime is good scheduler ettiquite.
From the output above you will want to look at the real
field. This is the time passed between pressing the enter key and the termination of the program. At this point, we know that we should submit for at least 5 minutes of walltime. That should allow enough time for the job to run to completion.
Note
Determining walltime can be tricky. To avoid potential job loss it is suggested to add 15-20% more walltime than jobs typically need. This will ensure jobs have enough walltime to complete the task. So if your job takes 8 minutes to complete, submit for 10.
Now that we have all of this information about the job we are ready to build our first submit script for submitting in batch to the scheduler.
#!/bin/bash
# Request 5 CPU's for the job
#SBATCH --cpus-per-task=1
# Request 8GB of memory for the job
#SBATCH --mem=8GB
# Walltime (job duration)
#SBATCH --time=00:05:00
# Then finally, our code we want to execute.
time python3 invert_matrix.py
Before we move to the next portion of submitting the job via sbatch, lets adjust the script to use 5 cores instead of 1.
So that you do not have to open an editor an updated version of the script is in the folder. Go ahead and take a look at the file invert_matrix_5_threads.py
:
cat invert_matrix_5_threads.py
#!/usr/bin/python3
import os
# Set the number of threads to 5 to limit CPU usage to 5 cores
os.environ["OPENBLAS_NUM_THREADS"] = "5" # For systems using OpenBLAS
# Now import NumPy after setting environment variables
import numpy as np
import sys
# Main computation loop
for i in range(2, 1501):
x = np.random.rand(i, i)
y = np.linalg.inv(x)
z = np.dot(x, y)
e = np.eye(i)
r = z - e
m = r.mean()
if i % 50 == 0:
print("i,mean", i, m)
sys.stdout.flush()
os.environ["OPENBLAS_NUM_THREADS"] = "5"
Lets go ahead an run that script now to see if adding 5 cores speeds it up:
time python3 invert_matrix_5_threads.py
Was it faster?