Getting Started with GPUs
We have H100, A100, P100, and K80 GPUs. The list is maintained under the Documentation menu, System Access -> Compute Resources. Entries that say "Preemption only" are owned by certain departments and faculty but are available through preemption.
Using GPUs in Jobs
You can request GPUs in Slurm using the --gpus
flag. As an example you can request a whole marylou8g node and its 2 GPUs by including the following in your sbatch
options: --nodes=1 --mem=64G --exclusive --gpus=2 --constraint='kepler'
. Please be aware of how many GPUs your program can use and request accordingly. If you can only use one GPU and 6 processors use something like: --nodes=1 --ntasks=6 --mem=12G --gpus=1
. The environment variable CUDA_VISIBLE_DEVICES
will contain a comma separated list of CUDA devices that your job has access to.
To compile or run CUDA code you'll need the CUDA libraries and runtime; get them in your path by doing: module load cuda
.
For interactive development or for compiling CUDA programs, you will need to request an interactive job using salloc
. The salloc
program accepts the same flags as sbatch
but you must provide them on the command-line since salloc
isn't given a file to run (it gives you a shell instead).
Job Restrictions
Here are our current GPU restrictions:
- Walltime of 3 days
- No more than 26 GPU nodes total
- No more than 26 GPU jobs at any time
- Your jobs can escape all limits (up to 7 days walltime) by becoming preemptable
These are subject to change, and administrators may impose additional restrictions as they see fit based on demand.
Last changed on Tue Mar 26 10:58:02 2024