Skip to content

Estimating job resources

When trying to estimate how many resources to submit for you can use the interactive node x01.

To login to x01 you issue:

ssh x01

Screenshot

To get started, please copy a small zip folder containing some python code and a sample submit script.

cp /dartfs-hpc/admin/Class_Examples.zip . && unzip Class_Examples.zip

The above command will copy the .zip file called Class_Examples from the location /dartfs-hpc/admin. The . instructs the copy to your current working directory. The && instructs to run the next command, which is to unzip the contents of the folder into the directory you are in.

When estimating your resource utilization you can use a program like htop to monitor current utilization.

In this tutorial lets open two terminals side by side. In one terminal we will launch our python code from the folder we unzipped. In the other terminal we will run the command htop -u to look at resource utilization.

Inside the Class_Examples folder is basic python script we will use for estimating resources. The script is called invert_matrix.py. Lets run the script to see what it does.

cd Class_Examples
python2 invert_matrix.py

Screenshot

Once your python command is executing like the above image, use the second terminal you opened to run the htop command. My username is john so my command will be:

htop -u john

You can get out of htop by simply hitting the letter q.

The next htop screen will display information about the state of the system but also information like the number of CPU's, amount of system memory, and other useful information. In this case we are looking at two fields in particular. CPU% & RES short for reserved memory.

Screenshot

From the above the CPU% column is showing 293%. That is equivalent to 2 CPU's, and about 93% of another CPU. With this information I know to submit my job for at least 3 CPU's in order for it to run efficiently.

In the other col RES we can see that we are not using quite a full GB of memory (73848KB). We know from this output that requesting at least 1GB of memory will be sufficient for our job.

The next resource you should consider estimating before subming your job is walltime walltime is used to determine how long your job will run for. Estimating accurate walltime is good scheduler ettiquite. From the command line on x01 lets run our python code, but add the time command at the beginning.

time python2 invert_matrix.py

Screenshot

From the output above you will want to look at the real field. This is the time passed between pressing the enter key and the termination of the program. At this point, we know that we should submit for at least 5 minutes of walltime. That should allow enough time for the job to run to completion.

Note

Determining walltime can be tricky. To avoid potential job loss it is suggested to add 15-20% more walltime than jobs typically need. This will ensure jobs have enough walltime to complete the task. So if your job takes 8 minutes to complete, submit for 10.

Now that we have all of this information about the job we are ready to build our first submit script for submitting in batch to the scheduler.

#!/bin/bash -l

# Request 3 CPUs for the job
#SBATCH --cpus-per-task=3

# Request 1GB of memory for the job
#SBATCH --mem=1GB

# Walltime (job duration)
#SBATCH --time=00:05:00

# Then finally, our code we want to execute. 
time python2 invert_matrix.py