The same storage system that gave us problems on Wednesday hung again Friday evening. This time we did not lose any nodes. We know what is causing the issue now and are working with the vendor on a software fix. Running jobs had time added to compensate for the hang. Last Updated Friday, Jan 17 06:50 pm 2020

My compute directory is missing or seems broken. How do I fix it?

The quick solution is to open a new ticket titled, "I broke my compute symlink," and we'll help you correct it. However, we recommend that you read the following to know what this problem is and how to fix it yourself.

Symbolic Links on FSL

When logged into our systems, a user sees what appears to be a single filesystem in their home directory, with subdirectories such as compute/, fsl_archive/, and various groups in fsl_groups/. In actuality, multiple filesystems are in use, and several directories are just pointing to different locations on the system. This is why the following output is produced when listing a typical compute "directory" in long form:

ls -ld /fslhome/<username>/compute
lrwxrwxrwx 1 root root 47 Apr 17  2014 compute -> /panfs/<username>

The compute directory isn't a directory at all; it's a symbolic link that points to your allocation of compute storage.

How to Fix a Symlink

Some users try to recreate their compute directory with the mkdir command. This is incorrect because all it does is create a new directory in home storage, which is lower performance and limited to 100 GB.

The correct way to recreate a compute link is the following command, replacing <username> with your username:

ln -s /panfs/<username> ~/compute

If you already created a compute directory with mkdir and data has been placed in it, then do the following:

  • Move any data in compute/ to a temporary location
  • Delete the directory with rmdir ~/compute
  • Create the proper symbolic link with the command above
  • Move any data back into the correctly linked compute/ directory

Job Script Bugs

Occasionally, a user inadvertently deletes their compute/ directory because of a bug in their job script. If you are unaware of how your compute/ directory was deleted or altered, then try reviewing the scripts that you run.