BYU

Office of Research Computing

Storage

hard_disk_small.jpg Office of Research Computing data storage offerings are to be used in conjunction with high performance compute processing performed on Office of Research Computing compute resources. Data for other purposes should be held elsewhere.

Rent or Purchase Additional Storage

Please see this document to learn more about rental and purchase options.

Storage Resources Summary

User Directories

Name Path Default Quota Backed Up Snapshots Auto-deletion Purpose
User Home /home/username 2 TiB, 2 million files Yes Yes No Critical data to be retained, source code, etc.
User Scratch /nobackup/autodelete/usr/username or
~/nobackup/autodelete
20 TiB, 2 million files No Yes Yes, 12 weeks Temporary files for an active project
User Archive /nobackup/archive/usr/username or
~/nobackup/archive
20 TiB, 2 million files No No/TBD No Long-term storage for projects you plan to return to
Local Scratch /tmp (local to node) N/A No No Yes, at job exit Limited storage on compute nodes for a job's scratch data

Group Directories

Name Path Default Quota Backed Up Snapshots Auto-deletion Purpose
Group Home /grphome/groupname or
~/groups/groupname
2 TiB, 2 million files Yes Yes No Critical data to be retained, source code, etc. for group projects
Group Scratch /nobackup/autodelete/grp/groupname or
~/groups/groupname/nobackup/autodelete or
/grphome/groupname/nobackup/autodelete
20 TiB, 2 million files No Yes Yes, 12 weeks Temporary files for an active group project
Group Archive /nobackup/archive/grp/groupname or
~/groups/groupname/nobackup/archive or
/grphome/groupname/nobackup/archive
20 TiB, 2 million files No No/TBD No Long-term storage for group projects you plan to return to

Home Directories

  • Home directories should be used to store code and data deemed critical and worth being backed up.
  • Data in home directories is backed up daily.
  • Snapshots of home directories are performed daily.

Scratch (with auto-deletion of unused files)

  • Data in scratch is not backed up. You can manually back up your own data.
  • Scratch storage is expensive, high performance storage.
  • Data that is unused for 12 weeks will be automatically deleted.
    • Deleted files will be available in the .snapshot/ directory for a limited time, possibly up to 14 days.
    • Attempts to circumvent auto-deletion policies may result in revocation of account privileges.
  • Snapshots of scratch directories are performed daily. Note that snapshots are NOT backups. If the storage system itself has problems, snapshots will not aid in data recovery.

Archive

  • Data in archive is not backed up. You can manually back up your own data.
  • Archive storage is intended to get files off of our more expensive scratch file systems.
  • Archive storage still costs our department money, so please do not use this as a dumping ground.
  • Archive storage will be slow* and should generally NOT be used directly from batch jobs.

*For most of 2024, archive storage will be hosted on DDN/Lustre. Your files will be migrated to a much slower, cheaper system later in 2024.

tmp, or Local Scratch Storage

  • tmp storage uses the local hard drive on the node the job is running on
  • Users may create temporary files and directories in /tmp for local scratch file needs
  • Users are responsible for and should automate cleaning up of /tmp after every job
  • Each compute node has limited space. This may vary from about 30 GB to over 800 GB
  • We recommend using the path /tmp/$SLURM_JOB_ID to make sure that the directory is unique for each job.
  • /tmp is not backed up and is periodically cleaned up
  • An example script to manage the cleanup is at the bottom of this page

Privately-Owned/Rented Storage

Configurations differ for privately-owned storage. On the RHEL 9 image, these spaces are usually mounted under /nobackup/private/ instead of the previous /private/.

Quota Extensions

Quota extension requests are scrutinized and rarely approved, but you can contact us to ask about them.

For increases in capacity to even be considered, we would need to see evidence that file compression is being used where possible. Even then, almost no quota extension requests are ever approved. Only unique situations are even considered. You can rent or purchase additional storage.

Efforts to group files together in tar (or similar) files will likely be required before a file count (a.k.a. inode) quota extension will be considered. Backups are heavily impacted by the number of files, thus we avoid increasing this limit unless it is really necessary.

Other

If you store sensitive data (e.g. CUI, FERPA), do not include sensitive information in your file and directory names. These paths (but not the contents) can be sent to Globus if you use that service. Some examples are:

  • /home/userbob123/labresults/embarrassing_medical_condition/positive/SomePersonsFullName.pdf
  • /home/userjohn123/Chemistry444/students_who_failed_the_exam/Famous_Alumni_Full_Name.txt

Example /tmp cleanup script

#!/bin/bash
#SBATCH --nodes=1 --ntasks=8
#SBATCH --mem=16G
#SBATCH --time=3:00:00

TMPDIR=/tmp/$SLURM_JOB_ID

# prepare function to clean up directory if job is killed, etc
# NOTE: THIS IS NOT EXECUTED UNTIL THE SIGNAL IS CALLED.
cleanup_scratch() {
    echo "Cleaning up temporary directory inside signal handler, meaning I either hit the walltime, or deliberately deleted this job using scancel"
    rm -rfv $TMPDIR
    echo "Signal handler ended at:"
    date
    exit 1
}

#Now, associate this function with the signal that is called when the job is killed, or hits its walltime
trap 'cleanup_scratch' TERM

#Now, initial setup of temporary scratch directory
echo "Creating scratch directory at $TMPDIR"
mkdir -pv $TMPDIR 2>&1

#PUT CODE TO COPY YOUR DATA INTO $TMPDIR HERE IF NECESSARY
#DO WHATEVER YOU NEED TO DO TO GET YOUR SOFTWARE TO USE $TMPDIR. THIS WILL DEPEND ON THE SOFTWARE BEING USED
#PUT CODE TO RUN YOUR JOB HERE

echo "Cleaning up temporary directory at end of script, meaning that the job exited cleanly"
rm -rfv $TMPDIR