BYU-monogram

Office of Research Computing

We had minor storage issues earlier that caused one storage node to take over for another. This is usually a successful operation that you wouldn't notice, but this time some client compute nodes did not reconnect to the storage for some reason. Some jobs may have died and we are working to figure out why that happened. Last Updated Thursday, Feb 12 04:17 pm 2026

slurm-auto-array

Slurm job arrays are useful when one wants to run one command many times, each time with a different set of parameters. Sometimes job arrays are somewhat cumbersome to set up, though; for example, when one needs to run a command tens of thousands of times, several units of work need to be aggregated into one array task since there is a limit of 5,000 tasks per job array. slurm-auto-array seeks to remedy this by automatically aggregating work such that strain on the scheduler is reduced and throughput is maximized.

slurm-auto-array works much like GNU Parallel: it takes a newline-delimited list of arguments from stdin and submits a job array that runs a user-specified command on each argument. To use it, load the parallel and slurm-auto-array modules.

For information on usage, see the man page (man slurm-auto-array), help message (slurm-auto-array --help), and worked example on GitHub.

Alternatives

Job Arrays

If your tasks run longer than an hour and there are less than a few thousand of them, raw job arrays are a good choice, with less overhead than slurm-auto-array.

GNU Parallel

If you have a huge amount of very small tasks (e.g. 100,000 tasks that will run for about 5 minutes each), parallel is a better choice than slurm-auto-array; using parallel instead of slurm-auto-array is especially important if each unit of work uses one or more files, because moving around so many files in so short a time can bog down our storage systems, slowing your (and others') jobs dramatically. If you are working with a tarball where extracting a single file per task would be time-consuming, parallel is also a good choice.

As an example, say you have a tarball named mydir.tar.gz containing 200,000 files that you would like to process in parallel using a command of the form foo $filename > $filename.out. To do so, you could use a job script similar to:

#!/bin/bash

#SBATCH --nodes 1 # parallel is easier to use on a single node
#SBATCH --ntasks 16 --mem 32G --time 1-00:00:00

# Prep environment
module load parallel pigz
workdir="$(mktemp -d)"
outfiles="$(mktemp -d)"

# Unzip and process files
tar xf <(unpigz mydir.tar.gz) -C "$workdir" # unzip in parallel
parallel --jobs "$SLURM_NTASKS" foo {} ">" "$outfiles"/{}.out ::: "$workdir"/*

# Zip results to current directory and clean up
tar czf results.tar.gz -C "$outfiles" .
rm -r "$workdir" "$outfiles"

If you have enough work that you need to split it across multiple nodes, you can still use parallel; to do so, add the following to your script to tell parallel which nodes to use:

sshloginfile=`mktemp`
paste -d '/' <(perl -pe 's/(\d+)\(x(\d+)\)/substr("$1,"x$2,0,-1)/ge' <<<$SLURM_TASKS_PER_NODE | tr ',' '\n') \
             <(scontrol show hostnames) > $sshloginfile
trap 'rm $sshloginfile' EXIT

...and add these flags to the parallel invocation:

--ssh 'ssh -o ServerAliveInterval=300' --sshloginfile "$sshloginfile"

Since using parallel ties you to a single job, your job may not finish as quickly since larger jobs tend to wait in the queue for longer, but if you are working with a tremendous amount of files and/or very short jobs it is likely to be faster than slurm-auto-array.