Estimating job resources¶
When trying to estimate how many resources to submit for you can use the interactive node x01.
To login to x01 you issue:
ssh x01
To get started, please copy a small zip folder containing some python code and a sample submit script.
cp /dartfs-hpc/admin/Class_Examples.zip . && unzip Class_Examples.zip
The above command will copy the .zip file called Class_Examples from the location /dartfs-hpc/admin
. The .
instructs the copy to your current working directory. The &&
instructs to run the next command, which is to unzip the contents of the folder into the directory you are in.
When estimating your resource utilization you can use a program like htop
to monitor current utilization.
In this tutorial lets open two terminals side by side. In one terminal we will launch our python code from the folder we unzipped. In the other terminal we will run the command htop -u
Inside the Class_Examples folder is basic python script we will use for estimating resources. The script is called invert_matrix.py
. Lets run the script to see what it does.
cd Class_Examples
python2 invert_matrix.py
Once your python command is executing like the above image, use the second terminal you opened to run the htop command. My username is john
so my command will be:
htop -u john
You can get out of htop by simply hitting the letter q
.
The next htop screen will display information about the state of the system but also information like the number of CPU's, amount of system memory, and other useful information. In this case we are looking at two fields in particular. CPU%
& RES
short for reserved memory.
From the above the CPU%
column is showing 293%
. That is equivalent to 2 CPU's, and about 93% of another CPU. With this information I know to submit my job for at least 3 CPU's in order for it to run efficiently.
In the other col RES
we can see that we are not using quite a full GB of memory (73848KB). We know from this output that requesting at least 1GB
of memory will be sufficient for our job.
The next resource you should consider estimating before subming your job is walltime
walltime is used to determine how long your job will run for. Estimating accurate walltime is good scheduler ettiquite. From the command line on x01 lets run our python code, but add the time
command at the beginning.
time python2 invert_matrix.py
From the output above you will want to look at the real
field. This is the time passed between pressing the enter key and the termination of the program. At this point, we know that we should submit for at least 5 minutes of walltime. That should allow enough time for the job to run to completion.
Note
Determining walltime can be tricky. To avoid potential job loss it is suggested to add 15-20% more walltime than jobs typically need. This will ensure jobs have enough walltime to complete the task. So if your job takes 8 minutes to complete, submit for 10.
Now that we have all of this information about the job we are ready to build our first submit script for submitting in batch to the scheduler.
#!/bin/bash -l
# Request 3 CPUs for the job
#SBATCH --cpus-per-task=3
# Request 1GB of memory for the job
#SBATCH --mem=1GB
# Walltime (job duration)
#SBATCH --time=00:05:00
# Then finally, our code we want to execute.
time python2 invert_matrix.py