We encountered problems with our infrastructure for several hours starting at about 4:25pm on Friday. The issues are now resolved. Your batch jobs may or may not have been impacted. See this email for details. Last Updated Friday, Apr 19 08:12 pm 2024

How do I submit a large number of very similar jobs?

There are a few tricks that can help when submitting large numbers of similar jobs, that will make your life easier. This FAQ will outline some of them, and grow as we see other things that will help.

Note: If you are trying to submit a large number of very small, very short jobs, please also read this page.

Avoiding multiple files, one per job

Many times when users have hundreds of jobs, and only one thing changes between them, they create a new job script for each of them, and then call sbatch on each of them. That's a lot of files, and if the changes are minimal, you don't really need to do it anyway. Instead, you can create a single job script file, and encapsulate the changes into a variable, that you can pass in a value for, when you submit the job.

Example Scenario

Let me give you an example. Let's say that you have a directory that contains all your job information. You have 700 different cases to submit, and they each have their own directory, named like this:

case001/
case002/
case003/
case004/
...
case100/
case101/
...
case695/
case696/
case697/
case698/
case699/
case700/

Inside each of these case directories, exists a script that runs the individual case called ''runcase.sh''. Therefore, the submission script (call it ''submit_case001.sh'') for case 1 looks something like this:

#!/bin/bash
#SLURM --ntasks=1
#SLURM --time=00:30:00
#SLURM --mem-per-cpu=1G
#SLURM --job-name=my_case001_job

cd $SLURM_SUBMIT_DIR/case001

./runcase.sh

And then when you submit the case, you use syntax like this:

sbatch submit_case001.sh

What's wrong with that?

Using this model, you'd create 700 individual scripts, and the only thing that would change would be the case number. A much easier way would be to use a job array, which would only require the use of a single script.

Job Arrays

Job arrays are collections of similar tasks, each executing the same script. In order to allow tasks to do unique work, each has an ID which is available to the task via the environment variable SLURM_ARRAY_TASK_ID. These ID's are given when submitting the array, and can be specified in a few different ways:

Submission syntax	Resulting task ID's	Description
`--array=1,2,3,5,8`	1, 2, 3, 5, 8	Comma-separated list
`--array=1-6`	1, 2, 3, 4, 5, 6	Range of ID's
`--array=0-20:4`	0, 4, 8, 12, 16, 20	Range of ID's, with step size 4

For the example job mentioned, one could use an array with tasks 1-700 since we have 700 cases, named "case001"-"case700". The submission script might look something like this:

#!/bin/bash
# submit_array.sh

#SBATCH --ntasks=1
#SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=1G
#SBATCH --array=1-700

# pad the task ID with leading zeros (to get 001, 002, etc.)
CASE_NUM=`printf %03d $SLURM_ARRAY_TASK_ID`

cd $SLURM_SUBMIT_DIR/case$CASE_NUM

./runcase.sh

Submitting with sbatch submit_array.sh would result in an array of 700 tasks, each running one of the above-mentioned cases.

For more information, see Slurm tips and tricks or the official job array documentation.

Last changed on Tue Aug 22 18:43:22 2023