How do I use the local hard drive on the node?
In some situations, it makes sense to use the local hard drive on the compute nodes, as temporary space. This document will describe how this is done, and the important considerations.
PLEASE READ THIS WHOLE PAGE BEFORE GETTING STARTED USING LOCAL HARD DRIVES. IT IS VERY IMPORTANT THAT YOU UNDERSTAND HOW TO PROPERLY USE THE SYSTEM.
Motivation
Some applications tend to do a lot of reading and writing, especially random, small-block reads and writes, which are about the worst type of Input/Output (I/O) workload ever. Additionally, we occasionally see surges of traffic on the centralized filesystems, which make them run fairly slow. Of course we are working on fixing this, but you may still see it on occasion.
If you need some space to use for temporary files, the local hard drive on the compute nodes might be a good idea. It really depends on the context, what's going on with the system, etc.
Location
The space available on the local hard drive may be used under the "/tmp" folder. In order to avoid possibly overwriting files, we recommend creating a directory based on the unique job number. The example on this page will demonstrate this.
It is also important to know that the /tmp folder is local to each compute node. So, if you use multiple nodes in your job, the data you put there may not necessarily be visible everywhere. However, most of the people who use this method, are running one-node jobs, so this is a non-issue for them. But be aware of the implications.
Cleaning Up After Yourself
Since the "/tmp" folder is only available on each individual compute node, you won't be able to clean up after yourself interactively. Therefore, each job must do the cleanup for itself.
In general, this consists of the following, either at the end of the job, or when the job is deleted/canceled:
- Copying any needed data back to the central file systems (eg. home or compute directories)
- Removing the temporary directories on the local hard drive
If you put code in your job script to do this at the end of the job, it will work, assuming that your job runs to completion. However, if you job hits its walltime and is killed by the system, or if you decide to remove it using "scancel" after the job starts running, then that cleanup code at the end of the job will not be reached. The answer to this lies in handling system signals, which is demonstrated in the example on this page.
Space Available
It should be noted that the "/tmp" filesystem is used for a number of other uses, so the full size may not be available. Also, if you have more than one job utilizing the local "/tmp" filesystem, they will also be competing for space.
In general, the nodes have the following data capacities:
Nodes | Approximate space in /tmp |
m7 | 200 GB |
m8 | 200 GB |
m9 | 800 GB |
m8f/m8h (bigmemory) nodes | 400 GB |
m8g (GPU) nodes | 400 GB |
Example
#!/bin/bash
#SBATCH --time=03:00:00 # walltime
#SBATCH --ntasks=8 # number of processor cores (i.e. tasks)
#SBATCH --nodes=1 # number of nodes
#SBATCH --mem-per-cpu=1024M # memory per CPU core
#SBATCH -J "myjobname" # job name
#define variables to represent the directories involved
TEMPORARY_DIR="/tmp/$SLURM_JOB_ID"
DATASRC_DIR="$HOME/data_source"
DATADEST_DIR="$HOME/data_dest"
#set up function. this isn't called/run here. It's just used
# if the job is canceled via a signal
cleanup_scratch()
{
echo "Deleting inside signal handler, meaning I probably either hit the walltime, or deleted the job using scancel"
#copy wanted data from $TEMPORARY_DIR to $DATADEST_DIR
cp -v "$TEMPORARY_DIR/results.dat" "$DATADEST_DIR"
#change to a safe location
cd "$HOME"
#remove the remaining data in $TEMPORARY_DIR
rm -rfv "$TEMPORARY_DIR"
echo "---"
echo "Signal handler ending time:"
date
exit 0
}
#Associate the function "cleanup_scratch" with the TERM signal, which is usually how jobs get killed
trap 'cleanup_scratch' TERM
#create temporary directory
echo "Creating Temporary directory at $TEMPORARY_DIR"
mkdir -pv "$TEMPORARY_DIR" 2>&1
echo "---"
#copy working data information from $DATASRC_DIR* to $TEMPORARY_DIR
echo "Copying working data information from $DATASRC_DIR* to $TEMPORARY_DIR"
cp -v "$DATASRC_DIR/"* "$TEMPORARY_DIR"
echo "---"
#DO YOUR JOB'S WORK HERE
# NOTE: what you do to utilize $TEMPORARY_DIR depends on your program and environment
echo "Deleting at the end of the job"
#copy wanted data from $TEMPORARY_DIR to $DATADEST_DIR
cp -v "$TEMPORARY_DIR/results.dat" "$DATADEST_DIR"
#change to a safe location
cd "$HOME"
#remove the remaining data in $TEMPORARY_DIR
rm -rfv "$TEMPORARY_DIR"
echo "---"
Last changed on Tue Dec 20 16:59:51 2016