Getting Started With GPUs
We have H200s, L40Ss, H100s, P100s, A100s, and V100s. The list is maintained under the Documentation menu, System Access -> Compute Resources. Entries that say "Preemption only" are owned by certain departments and faculty but are available through preemption.
Using GPUs in Jobs
Using Slurm, you can request GPUs using the --gpus
flag. For instance, if you want a whole marylou9g node with 4 GPUs, you can include the following in your sbatch options: --nodes=1 --mem=128G --exclusive --gpus=4 --constraint='pascal'
. Be sure to consider how many GPUs your program can utilize, and adjust your request accordingly; for example, if your program only uses one GPU and 6 processors, you might use --nodes=1 --ntasks=6 --mem=12G --gpus=1
. Additionally, the environment variable CUDA_VISIBLE_DEVICES
will list, as a comma-separated string, the CUDA devices available to your job. You can also specify particular GPU models, such as H200s or L40s, by using options like --gpus=h200:1
or --gpus=l40s:1
respectively, where the :1
denotes the number of GPUs you need.
To compile or run CUDA code you'll need the CUDA libraries and runtime; get them in your path by doing: module load cuda
.
For interactive development or for compiling CUDA programs, you will need to request an interactive job using salloc
. The salloc
program accepts the same flags as sbatch
but you must provide them on the command-line since salloc
isn't given a file to run (it gives you a shell instead).
Job Restrictions
Here are our current GPU restrictions:
- Walltime of 3 days
- No more than 26 GPU nodes total
- No more than 26 GPU jobs at any time
- Your jobs can escape all limits (up to 7 days walltime) by becoming preemptable
These are subject to change, and administrators may impose additional restrictions as they see fit based on demand.
Last changed on Thu Mar 27 12:44:43 2025