Estimating job resources

When trying to estimate how many resources your job requires, you can use an interactive srun session to test your code.

Start by opening a terminal connected to a login node. Then launch an interactive job that requests 5 CPU cores:

srun --cpus-per-task=5 --pty /bin/bash

Once you're on the compute node, activate your Conda environment:

conda activate discovery_class

Next, copy over a zip file containing example Python code and a sample submit script:

cp /dartfs-hpc/admin/Class_Examples.zip . && unzip Class_Examples.zip

This command copies the file Class_Examples.zip from the /dartfs-hpc/admin directory into your current working directory (denoted by .), then unzips it. The && ensures the unzip command only runs if the copy is successful.

Monitoring Resource Usage with `top`

To estimate your resource usage, you can monitor the running job using a tool like top.

We recommend opening two terminals side by side:

Terminal 1 — Run your Python script in the interactive job.
Terminal 2 — Monitor system usage using top.

Step 1: Identify Your Node In Terminal 1, after launching your interactive job, take note of the node you're on. It will show up in your prompt, like this:

[john@t04 ~]$

In my case my interactive job landed on t04.

In Terminal 2, SSH directly into the same node.

ssh t04

Now you have two terminals connected to the same compute node—one with your job, and one for monitoring.

Run the Python Script

cd Class_Examples
time python3 invert_matrix.py

Monitor with top

In Terminal 2:

top -u $(whoami)

This will display a live view of processes owned by your user. Look at the %CPU and RES (resident memory) columns to estimate your resource usage.

Press q to quit top.

The top screen also provides overall system stats like total CPU count, available memory, and active processes. This live monitoring can help you decide how many CPUs or how much memory your job will need when running under the batch scheduler.

Screenshot

Understanding Resource Usage from top

From the top output, you’ll notice that the %CPU column shows a value around 97–99%. This indicates that your script is using close to 1 full CPU core. Based on this, you know that requesting 1 CPU for your job would be sufficient for it to run efficiently.

In the RES column (short for resident memory), you can see that the script is using less than 1 GB of memory. From this, we know that requesting the default 8 GB of memory is more than sufficient—though you could reduce it further if needed.

Estimating Walltime

Another important resource to estimate before submitting your job is walltime, which defines how long your job will run. Providing an accurate walltime estimate is considered good scheduler etiquette, as it helps the system run more efficiently.

From the output of the time command, look at the real value:

Screenshot

This shows the elapsed time between starting and completing the job. In this case, the job takes less than 1 minute, so submitting with a walltime of at least 5 minutes is a safe buffer.

Note

Determining walltime can be tricky. To avoid potential job loss it is suggested to add 15-20% more walltime than jobs typically need. This will ensure jobs have enough walltime to complete the task. So if your job takes 8 minutes to complete, submit for 10.

Writing Your First Submit Script

Now that we’ve estimated the needed resources (1 CPU, <1 GB RAM, ~5 minutes), we can write a batch script to submit to the scheduler:

#!/bin/bash

# Request 5 CPU's for the job
#SBATCH --cpus-per-task=1

# Request 8GB of memory for the job
#SBATCH --mem=8GB

# Walltime (job duration)
#SBATCH --time=00:05:00

# Then finally, our code we want to execute.
time python3 invert_matrix.py

Running with 5 Cores Instead

Next, let’s adjust the script to use 5 CPU cores instead of 1.

To avoid editing manually, an updated version of the script is already available in the Class_Examples folder. Take a look at invert_matrix_5_threads.py:

cat invert_matrix_5_threads.py

#!/usr/bin/python3
import os

# Set the number of threads to 5 to limit CPU usage to 5 cores
os.environ["OPENBLAS_NUM_THREADS"] = "5"  # For systems using OpenBLAS

# Now import NumPy after setting environment variables
import numpy as np
import sys

# Main computation loop
for i in range(2, 1501):
    x = np.random.rand(i, i)
    y = np.linalg.inv(x)
    z = np.dot(x, y)
    e = np.eye(i)
    r = z - e
    m = r.mean()
    if i % 50 == 0:
        print("i,mean", i, m)
        sys.stdout.flush()

This script includes:

os.environ["OPENBLAS_NUM_THREADS"] = "5"

This sets the number of threads to 5, limiting the CPU usage to 5 cores (specifically for libraries like OpenBLAS used by NumPy).

Lets go ahead an run that script now to see if adding 5 cores speeds it up:

time python3 invert_matrix_5_threads.py

Was it faster?