Recovering the use of directories with missing files

Until we know whether the data from the recent storage incident is recoverable, we will refrain from deleting the files; instead, when one attempts to access them, an error message will be printed: Cannot send after transport endpoint shutdown. These files cannot be deleted or moved, which can be a nuisance when one is trying to recreate them.

To regain the unfettered use of a directory with missing files, it is best to create a new directory and move all remaining files into it, leaving behind the skeleton of the original directory tree with the missing files. For example, to "recover" the use of mydir, one can:

recovery_dir="mydir"
mv "$recovery_dir" "$recovery_dir"-old
cd "$recovery_dir"-old
find -type d -exec mkdir -p ../"$recovery_dir"/{} \;
find -type f -exec mv {} ../"$recovery_dir"/{} \;

The double quotes are important. If you omit them, you may encounter problems with files and directories that contain spaces.

Please do not run these commands on your compute directory itself--compute directories are actually symlinks that should not be modified or moved. Instead, run it on directories within compute.

Files that cause hangs

Several users have recently reported that operating on certain files and directories leads to irreversible hangs--for example, running ls some/misbehaving/dir might cause a hang with no solution except closing the shell. If you run into such a directory, you can fix it with the statable tool, which we wrote to make working with these infernal files easier. statable traverses a directory tree, printing the names of good files to stdout and bad files to stderr, appending a / to directory names. This makes it very simple to get a list of the files in a directory that are misbehaving:

module load statable
statable some/misbehaving/dir >/dev/null

statable can be used to recover directories similarly to how find can be used for directories with files lost due to the storage incident. First, create a restoration directory and navigate to the misbehaving directory:

bad_dir="some/misbehaving/dir"
mkdir "${bad_dir}_fixed"
cd "$bad_dir"

Next, pipe the output of statable through grep to find just the directories, cloning the misbehaving directory structure to the fixed directory:

statable 2>/dev/null | grep -e '/$' | while read dirname; do
    mkdir -p "${bad_dir}_fixed/$dirname"
done

Again, the double quotes are vital. You can omit 2>/dev/null if you would like the names of the misbehaving files to be printed while the directory structure is being copied.

Once the directory structure is duplicated, copy over all remaining good files, reversing the above grep command with -v to move everything that is not a directory:

statable 2>/dev/null | grep -ve '/$' | while read filename; do
    mv "$filename" "${bad_dir}_fixed/$filename"
done

All good files will now be in ${bad_dir}_fixed, leaving only the old directory structure and the bad files in $bad_dir.

Last changed on Fri Dec 4 15:28:24 2020