Storage

Office of Research Computing data storage offerings are to be used in conjunction with high performance compute processing performed on Office of Research Computing compute resources. Data for other purposes should be held elsewhere.

Rent or Purchase Additional Storage

Please see this document to learn more about rental and purchase options.

Storage Resources Summary

User Directories

Name	Path	Default Quota	Backed Up	Snapshots	Auto-deletion	Purpose
User Home	/home/username	2 TiB, 2 million files	Yes	Yes	No	Critical data to be retained, source code, etc.
User Scratch	/nobackup/autodelete/usr/username or ~/nobackup/autodelete	20 TiB, 2 million files	No	Yes	Yes, 12 weeks	Temporary files for an active project
User Archive	/nobackup/archive/usr/username or ~/nobackup/archive	20 TiB, 2 million files	No	No/TBD	No	Long-term storage for projects you plan to return to
Local Scratch	/tmp (local to node)	N/A	No	No	Yes, at job exit	Limited storage on compute nodes for a job's scratch data

Group Directories

Name	Path	Default Quota	Backed Up	Snapshots	Auto-deletion	Purpose
Group Home	/grphome/groupname or ~/groups/groupname	2 TiB, 2 million files	Yes	Yes	No	Critical data to be retained, source code, etc. for group projects
Group Scratch	/nobackup/autodelete/grp/groupname or ~/groups/groupname/nobackup/autodelete or /grphome/groupname/nobackup/autodelete	20 TiB, 2 million files	No	Yes	Yes, 12 weeks	Temporary files for an active group project
Group Archive	/nobackup/archive/grp/groupname or ~/groups/groupname/nobackup/archive or /grphome/groupname/nobackup/archive	20 TiB, 2 million files	No	No/TBD	No	Long-term storage for group projects you plan to return to

Home Directories

Home directories should be used to store code and data deemed critical and worth being backed up.
Data in home directories is backed up daily.
Snapshots of home directories are performed daily.

Scratch (with auto-deletion of unused files)

Data in scratch is not backed up. You can manually back up your own data.
Scratch storage is expensive, high performance storage.
Data that is unused for 12 weeks will be automatically deleted.
- Deleted files will be available in the .snapshot/ directory for a limited time, possibly up to 14 days.
- Attempts to circumvent auto-deletion policies may result in revocation of account privileges.
Snapshots of scratch directories are performed daily. Note that snapshots are NOT backups. If the storage system itself has problems, snapshots will not aid in data recovery.

tmp, or Local Scratch Storage

tmp storage uses the local hard drive on the node the job is running on
Users may create temporary files and directories in /tmp for local scratch file needs
Users are responsible for and should automate cleaning up of /tmp after every job
Each compute node has limited space. This may vary from about 30 GB to over 800 GB
We recommend using the path /tmp/$SLURM_JOB_ID to make sure that the directory is unique for each job.
/tmp is not backed up and is periodically cleaned up
An example script to manage the cleanup is at the bottom of this page

Privately-Owned/Rented Storage

Configurations differ for privately-owned storage. On the RHEL 9 image, these spaces are usually mounted under /nobackup/private/ instead of the previous /private/.

Quota Extensions

Quota extension requests are scrutinized and rarely approved, but you can contact us to ask about them.

For increases in capacity to even be considered, we would need to see evidence that file compression is being used where possible. Even then, almost no quota extension requests are ever approved. Only unique situations are even considered. You can rent or purchase additional storage.

Efforts to group files together in tar (or similar) files will likely be required before a file count (a.k.a. inode) quota extension will be considered. Backups are heavily impacted by the number of files, thus we avoid increasing this limit unless it is really necessary.

Other

If you store sensitive data (e.g. CUI, FERPA), do not include sensitive information in your file and directory names. These paths (but not the contents) can be sent to Globus if you use that service. Some examples are:

/home/userbob123/labresults/embarrassing_medical_condition/positive/SomePersonsFullName.pdf
/home/userjohn123/Chemistry444/students_who_failed_the_exam/Famous_Alumni_Full_Name.txt

Example /tmp cleanup script

#!/bin/bash
#SBATCH --nodes=1 --ntasks=8
#SBATCH --mem=16G
#SBATCH --time=3:00:00

TMPDIR=/tmp/$SLURM_JOB_ID

# prepare function to clean up directory if job is killed, etc
# NOTE: THIS IS NOT EXECUTED UNTIL THE SIGNAL IS CALLED.
cleanup_scratch() {
    echo "Cleaning up temporary directory inside signal handler, meaning I either hit the walltime, or deliberately deleted this job using scancel"
    rm -rfv $TMPDIR
    echo "Signal handler ended at:"
    date
    exit 1
}

#Now, associate this function with the signal that is called when the job is killed, or hits its walltime
trap 'cleanup_scratch' TERM

#Now, initial setup of temporary scratch directory
echo "Creating scratch directory at $TMPDIR"
mkdir -pv $TMPDIR 2>&1

#PUT CODE TO COPY YOUR DATA INTO $TMPDIR HERE IF NECESSARY
#DO WHATEVER YOU NEED TO DO TO GET YOUR SOFTWARE TO USE $TMPDIR. THIS WILL DEPEND ON THE SOFTWARE BEING USED
#PUT CODE TO RUN YOUR JOB HERE

echo "Cleaning up temporary directory at end of script, meaning that the job exited cleanly"
rm -rfv $TMPDIR

Last changed on Wed Jun 19 15:17:17 2024