Rclone

Rclone allows one to sync files and directories to and from cloud storage via the command line. In combination with box.byu.edu, where BYU students and faculty get unlimited free storage, it can make storing and backing up archival data much easier. Rclone+Box will help users who routinely run up against storage space constraints and who wish to back up data that can only fit in compute. Those who wish to collaborate without making others get FSL accounts can upload to Box with Rclone, then share their data with collaborators (even if those collaborators don't have Box accounts).

This tutorial will show how to configure Rclone with Box, a few of the most useful commands, and a couple worked examples. It is by no means comprehensive, so those wanting to learn more should reference the documentation, which is excellent.

Note that while the storage on box is unlimited, expansive storage comes at a cost: Box is slow, so it takes a while to move big chunks of data. Additionally, files stored there are cannot exceed 32 GB in size.

Configuration

Keep in mind that Rclone need only be configured once--as soon as you've finished the steps below, you should never need to do so again as long as you use it at least monthly. You'll need to download Rclone on your local machine, unless you would like to forward a port and configure on the remote node as if it were local (this method is less reliable).

rclone config

To access Rclone, log in to the supercomputer and load the rclone module:

module load rclone

Once that's done, run rclone config. This will give you a few options:

No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n

Enter n to make a new remote. Give it a name (e.g. box), then choose which storage service you'd like to configure (you can type box for box.byu.edu, drive for Google Drive, etc.).

It'll ask for Box App Client Id and Box App Client Secret; most users should simply hit enter to leave these blank. You'll then be asked if you want to "Edit advanced config" (most users should enter n):

Edit advanced config? (y/n)
y) Yes
n) No
y/n> n

Next, you will be asked whether to use auto config:

Remote config
Use auto config?
 * Say Y if not sure
 * Say N if you are working on a remote or headless machine
y) Yes
n) No
y/n> n

Since you are working on a remote machine, enter n. You will then be presented with a message prompting you to run rclone authorize "box" on your local machine:

For this to work, you will need rclone available on a machine that has a web browser available.
Execute the following on your machine:
    rclone authorize "box"
Then paste the result below:
result>

Run rclone authorize "box" in a command prompt on your local machine; upon doing so a browser will open and prompt you to log in to Box (if it doesn't, click here). If you're not logged in already, it will ask for your credentials; use yournetid@byu.edu for the email address. You'll then see a screen with a big blue Grant access to Box button--click it, and you should be greeted with a success message. Go back to the local terminal, copy the everything between ---> and <---:

If your browser doesn't open automatically go to the following link: http://127.0.0.1:53682/auth
Log in and authorize rclone for access
Waiting for code...
Got code
Paste the following into your remote machine --->
{"access_token":"XXXXX","token_type":"bearer","refresh_token":"XXXXX","expiry":"2019-01-01T00:00:00-06:00"}
<---End paste

...and paste it after result> on the remote terminal:

result> {"access_token":"XXXXX","token_type":"bearer","refresh_token":"XXXXX","expiry":"2019-01-01T00:00:00-06:00"}
--------------------
[box]
type = box
client_id = 
client_secret = 
token = {"access_token":"XXXXX","token_type":"bearer","refresh_token":"XXXXX","expiry":"2019-01-01T00:00:00-06:00"}
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d> y

...after entering y, you're finished configuring Rclone to work with Box.

Usage

This tutorial will only cover the basics due to the clarity and breadth of Rclone's exceptional documentation, which should be your first resource when learning its usage. Typing rclone --help will result in a deluge of information, but the "Available Commands" section of the help message gives a good synopsis of each command. For help on a specific command, you can also use rclone <command> --help (e.g. rclone copy --help).

Listing files

Rclone gives a few methods for listing files; none of them are quite like Unix's ls, but rclone lsf --max-depth 1 remote:path/to/dir comes close. A few more examples:

# Recursively list all files at "box" remote
rclone ls box:

# Show directories in "fsl" at "box"
rclone lsd box:fsl

# Recursively list files in "fsl/dir1" at "box" with more detail
rclone lsl box:fsl/dir1

Moving and Copying

rclone copy and rclone move behave a essentially like Unix's cp and mv, respectively; you can copy and move to or from the remote. Example usage:

# Copy remote file, mydata.txt, from "fsl" at "box"
rclone copy box:fsl/mydata.txt $HOME/data/

# Move a tarball from compute to "fsl/compute-backup" at "box"
rclone move ~/compute/my-tarball.tar.gz box:fsl/compute-backup

Creating Directories

rclone mkdir behaves like Unix's mkdir; to create a new directory on a remote, you would use something like:

rclone mkdir box:fsl/myNewDirectory

Examples

Move Archival Data to Box

Say you have a directory with data that needs to be kept, but you don't expect to do any work on it with the supercomputer, and you're running out of space. You can either move it directly, or compress it and move it. Moving it directly is easier and you'll be able to look at the data directly at box.byu.edu, but compressing then moving could be much faster.

Generally, if you have a few big files (which must be under 32 GB, of course) you won't be slowed down too much by copying directly, but if you have many small files it will take a long time. Under ideal conditions, you can copy 4 files per second (across all processes--Box limits transfers by user). If you have a million files, that means it will take at least a few days to transfer them, no matter how small they each are.

To move without compressing, simply use:

rclone move ~/compute/dataset box:fsl/dataset

There are two main ways to compress then move data. This one is slower and more reliable:

tar -czf dataset.tar.gz ~/compute/dataset
rclone move dataset.tar.gz box:fsl/dataset.tar.gz

This one is faster and doesn't use significant disk space, but the work will be lost of the command is interrupted:

tar -czf - ~/compute/dataset | rclone rcat box:fsl/dataset.tar.gz

Backup compute with Box

Perhaps you have a large set of data in ~/compute/dataset, which is too big to fit in your home directory, that you would like to back up weekly. Say you set up the following directory structure to store the backups:

box:fsl
'-- backup
    '-- dataset
        '-- old
        '-- primary

...by running:

rclone mkdir box:fsl/backup
rclone mkdir box:fsl/backup/dataset
rclone mkdir box:fsl/backup/dataset/old
# primary will be created by the copy

The current backup will live at box:fsl/dataset/primary, while older snapshots, organized by date, will go in box:fsl/dataset/old/. To get started, let's copy over dataset to the current backup directory at box:fsl:

rclone copy ~/compute/dataset box:fsl/backup/dataset/primary

Keep in mind that Box is slow, so this may take some time. If you want to exit your ssh session while the copy is going, you may want to use screen or tmux.

Once the copy is done, you'll need to back up every week (or however frequently you would like to). This could go something like:

module load rclone
PRIMARY=box:fsl/backup/dataset/primary
OLD=fsl/backup/dataset/old/dataset-$(date +%F-%R)
screen -dm rclone sync $HOME/compute/dataset $PRIMARY --backup-dir $OLD
# using `screen -dm ...` means that rclone will keep going even if you log out

If you want to do this regularly, you can put it in a script and run it at your convenience; on the new operating system, you can use cron to run it automatically at regular intervals. To make the script (we'll call it do_rclone_backup.sh) execute weekly, use crontab -e to edit your crontab and enter something along the lines of 0 X * * Y bash /path/to/do_rclone_backup.sh (replacing X with an hour, 0-24, and Y with a day of the week, 0-6). Your backup script will now run once a week with no intervention from you. This tutorial goes into more depth in case you want to back up more or less frequently or would like to learn more about cron generally.