slurm-auto-array

Slurm job arrays are useful when one wants to run one command many times, each time with a different set of parameters. Sometimes job arrays are somewhat cumbersome to set up, though; for example, when one needs to run a command tens of thousands of times, several units of work need to be aggregated into one array task since there is a limit of 5,000 tasks per job array. slurm-auto-array seeks to remedy this by automatically aggregating work such that strain on the scheduler is reduced and throughput is maximized.

slurm-auto-array works much like GNU Parallel: it takes a newline-delimited list of arguments from stdin and submits a job array that runs a user-specified command on each argument. To use it, load the parallel and slurm-auto-array modules.

For information on usage, see the man page (man slurm-auto-array), help message (slurm-auto-array --help), and worked example on GitHub.

Alternatives

Job Arrays

If your tasks run longer than an hour and there are less than a few thousand of them, raw job arrays are a good choice, with less overhead than slurm-auto-array.

GNU Parallel

If you have a huge amount of very small tasks (e.g. 100,000 tasks that will run for about 5 minutes each), parallel is a better choice than slurm-auto-array; using parallel instead of slurm-auto-array is especially important if each unit of work uses one or more files, because moving around so many files in so short a time can bog down our storage systems, slowing your (and others') jobs dramatically. If you are working with a tarball where extracting a single file per task would be time-consuming, parallel is also a good choice.

As an example, say you have a tarball named mydir.tar.gz containing 200,000 files that you would like to process in parallel using a command of the form foo $filename > $filename.out. To do so, you could use a job script similar to:

#!/bin/bash

#SBATCH --nodes 1 # parallel is easier to use on a single node
#SBATCH --ntasks 16 --mem 32G --time 1-00:00:00

# Prep environment
module load parallel pigz
workdir="$(mktemp -d)"
outfiles="$(mktemp -d)"

# Unzip and process files
tar xf <(unpigz mydir.tar.gz) -C "$workdir" # unzip in parallel
parallel --jobs "$SLURM_NTASKS" foo {} ">" "$outfiles"/{}.out ::: "$workdir"/*

# Zip results to current directory and clean up
tar czf results.tar.gz -C "$outfiles" .
rm -r "$workdir" "$outfiles"

If you have enough work that you need to split it across multiple nodes, you can still use parallel; to do so, add the following to your script to tell parallel which nodes to use:

sshloginfile=`mktemp`
paste -d '/' <(perl -pe 's/(\d+)\(x(\d+)\)/substr("$1,"x$2,0,-1)/ge' <<<$SLURM_TASKS_PER_NODE | tr ',' '\n') \
             <(scontrol show hostnames) > $sshloginfile
trap 'rm $sshloginfile' EXIT

...and add these flags to the parallel invocation:

--ssh 'ssh -o ServerAliveInterval=300' --sshloginfile "$sshloginfile"

Since using parallel ties you to a single job, your job may not finish as quickly since larger jobs tend to wait in the queue for longer, but if you are working with a tremendous amount of files and/or very short jobs it is likely to be faster than slurm-auto-array.

Last changed on Tue Aug 6 13:52:49 2024