BYU

Office of Research Computing

R

R
R is a statistical computing language and software environment.

It is most well known for its plotting and statistical computing abilities, including statistical modeling and prediction. It is not as suited for high-performance computing as a compiled language like C or Fortran would be, and struggles to perform at an optimized level when dealing with large computation and data, especially when compared to similar dynamic languages like julia or python.

We strongly recommend that users do not use R for a major component of their HPC jobs. We do recommend using it for data analysis, statistical modeling, prediction, and proof-of-concept work for defining algorithms.

Creating an R Environment through Anaconda/ Miniconda/ conda/ mamba

When creating an R environment, use caution. Unlike other environments, when R packages and libraries are loaded they do not always grab the most recent versions, and can run into all sorts of compatibility and dependency issues, proving to be quite a headache.

Because of this, we recommend building using the conda/mamba package manager from Anaconda, and then building R inside of this environment. Anaconda does a great job of addressing these issues behind the scenes, and works well with R. conda/mamba is already installed on the supercomputer.

To create an R environment with conda/mamba, start by logging into the super computer and create a conda/mamba environment:

mamba create --name <env_name>

Next, activate the environment:

mamba activate <env_name>

Lastly, install R and any other desired packages into said environment:

mamba install -c conda-forge r <r-example> <r-packages>

Or if channels are already set up in ~/.condarc (see the bioconda section of the python documentation):

mamba install r <r-example> <r-packages>

Follow the command prompts, and just like that, your R environment is ready to go. Simply type "R" in the command to begin typing in R. To activate your new environment on a fresh login type mamba activate <env_name>, and to deactivate this environment; mamba deactivate.

For additional documentation see Using R language with Anaconda.

Common R Environment Issues and Solutions

As mentioned before, building an R environment can prove to be difficult depending on the libraries and packages desired.

When running into package dependency and build errors, make sure everything is up to date. You can do this in one swift command by typing:

mamba update r-caret

On the command line while inside of the R-environment. r-caret is what conda calls its R interpreter, so updating r-caret updates everything related to R, including dependencies and file paths, in one command. Additionally, you could launch R inside of the environment by typing "R" and running the command update.packages() , though we recommend trying the mamba command first.

If the package dependency and build errors still aren't resolved, we recommend creating a new environment, installing the most important packages through mamba commands (see the above section) to your build first, like tidyverse, before installing smaller packages. Also make sure you are using an up to date version of R in the conda/mamba environment with the command R --version.

If problems with your environment persist, see additional documentation here, or reach out to our office.

Installing Libraries (Including Tidyverse):

Tidyverse is largely the most useful library within R and brings with it a variety of quality of life packages including ggplot2 and dplyr. At one point tidyverse was how piping was implemented in R. It is strongly recommend you use Tidyverse when coding in R.

In R, packages are called libraries. When running R in a conda environment, it is recommended you install and update packages from the command line, with the command mamba install <r-package_name>. You can also install libraries within your R script by typing install.packages("<package_name>"), though we recommend installing through mamba, as it typically works out dependencies and compatibility issues on its own.

For libraries not included in anaconda's channels (Again, see the Bioconda section of the python documentation), we can run the command:

R -e '<options>; install.packages("package_name", repo="https://cloud.r-project.org/");'

to install the desired library from the command line while in your designated conda environment. The <options> piece of the command is not needed and only used if you want to designate a parameter like ncpus=8, and the repo= should point to the url link the package is located, with the "http://cloud.r-project.org/" being the most common repo used.

Using R with Bash

To run R scripts and commands from the bash shell, simply use the command Rscript.

For example, if you created a job script and would like to run an R script called example.r within the job, the tail end of your script would look something like:

# LOAD MODULES, INSERT CODE, AND RUN YOUR PROGRAMS HERE
Rscript example.r <optional_args>

This same Rscript example.r <optional_args> command can also be used to execute an R script directly from the command line. It's worth noting the <optional_args> can include inputs that can be fed to and implemented in your R script. Click here for more tips and tricks related to running R from Bash.

R in JupyterLab

It is possible to program in R from a JupyterLab Notebook. Setting this up initially can take some time, especially if Jupyter has not already been set up on the supercomputer.

Using Jupyter as an IDE can be beneficial for a number of reasons, as it is intended to make the life of a programmer easier. Jupyter's GUI means graphics can be displayed, and directories and files are easy to locate and navigate to. This means too that plots and graphs generated by your R code can be displayed when the code is run in real time, which is not the case when running standard R code from a terminal window.

This can be done from a Linux or MacOS computer system, but for Windows machines, it is necessary you set up ssh multiplexing first.

Before we show you how to get R working in a JupyterLab environment, we strongly recommend you look at the JupyterLab section of the python documentation (about halfway down the webpage) to understand how Jupyter works with your environment and the supercomputer's login nodes. If this is your first time enabling JupyterLabs from a conda environment, you should first follow the set up instructions there. R will not work within Jupyter until the entire set up has been completed and you are able to launch Jupyter on a browser.

Once you have completed the JupyterLab environment set up, you will install the appropriate packages within this environment to allow you to effectively program in R on a Jupyter Notebook.

If problems or questions arise, additional help can be found here, or in the documentation found at the end of this section.

First, we want to launch a new session of JupyterLabs. This will be similar to the instructions for Jupyter found towards the end of our python documentation.

We start by opening two terminal windows:

In Terminal #1:

# Connect to one of the login nodes:
> ssh user@ssh.rc.byu.edu

# Interactive SLURM request - 1 cpu, 1GB of memory, 1 hour
$ salloc --nodes=1 --ntasks=1 --mem-per-cpu=1G --time=1:00:00

# Transitioned to a compute node, note compute node name found on the last line for use in ssh command in terminal #2.

# Activate the jupyterlab environment
$ mamba activate jupyterlab

# Install the appropriate packages need to use R in Jupyter:
$ mamba install r r-irkernel

# Be sure to enter "y" or "yes" depending on the prompt, and the packages will install.
# Note that both r and r-irkernel are necessary, as this is the R program and interpreter Jupyter will need to use.

# Start Juptyer
$ jupyter-lab --no-browser

# Jupyter will select a port starting with 8888, and if already in use, count up from there.
# Note the port in the output for use in ssh command in terminal #2.
# Use in place of JupyterPort below.

In Terminal #2:

# On your local computer, NOT logged into the supercomputer:

> ssh -N -J user@ssh.rc.byu.edu -L JupyterPort:localhost:JupyterPort user@ComputeNodeName
# JupyterPort is the number generated in Terminal 1. user is your netid or rc.byu id. The ComputeNode is given in Terminal 1 as well. See example code below.

# Password/verification required for the login node, then password only for the compute node.
# The command will appear to hang: due to -N option, no remote command is executed, option is useful for just forwarding ports.

Example output terminal #1:

$ salloc --mem-per-cpu=1G --ntasks=1 --nodes=1 --time=1:00:00
salloc: Pending job allocation 39875987
salloc: job 39875987 queued and waiting for resources
salloc: job 39875987 has been allocated resources
salloc: Granted job allocation 39875987
salloc: Waiting for resource configuration
salloc: Nodes m8-17-2 are ready for job

### TAKE NOTE of the node you are given in the line above
# This will be used as the "ComputeNodeName" in the ssh command for terminal 2.

$ mamba activate jupyterlab

$ mamba install r r-irkernel

# ...

# Prompts user [Y]/[n]? Enter "y"

# ...

$ jupyter-lab --no-browser
[I 17:27:14.506 NotebookApp] Serving notebooks from local directory: /zhome/zdawg
[I 17:27:14.506 NotebookApp] Jupyter Notebook 6.2.0 is running at:
[I 17:27:14.506 NotebookApp] http://localhost:8888/?token=f9dd18240516658e7db98108eb2d40a4d96fb2a9331d6a65
[I 17:27:14.506 NotebookApp]  or http://127.0.0.1:8888/?token=f9dd18240516658e7db98108eb2d40a4d96fb2a9331d6a65
[I 17:27:14.506 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 17:27:14.513 NotebookApp] 

    To access the notebook, open this file in a browser:
         file:///zhome/zdawg/.local/share/jupyter/runtime/nbserver-86703-open.html
    Or copy and paste one of these URLs:
         http://localhost:8888/?token=f9dd18240516658e7db98108eb2d40a4d96fb2a9331d6a65
      or http://127.0.0.1:8888/?token=f9dd18240516658e7db98108eb2d40a4d96fb2a9331d6a65

Example output terminal #2:

> ssh -N -J zdawg@ssh.rc.byu.edu -L 8888:localhost:8888 zdawg@m8-17-2

# Longer alternative also works:
#> ssh -N -J zdawg@ssh.rc.byu.edu -L 8888:localhost:8888 zdawg@m8-17-2.rc.byu.edu

All network and system usage is subject to monitoring and recording in order to
maintain confidentiality, data integrity, and system availability. Any improper
or unlawful use may be disclosed to organization and law enforcement officials,
and appropriate action may be taken.

Password: 
Verification code: 
The authenticity of host "'m8-17-2 (<no hostip for proxy command>)' can't be established.
ECDSA key fingerprint is SHA256:79ta2j/EcjQO7oM7kwSVFkED9YjxovQ8knzdVqA03+E.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'm8-17-2' (ECDSA) to the list of known hosts."
zdawg@m8-17-2s password: 

# hangs, leave this terminal alone until ...
#     ssh terminated by ctrl-c, or Jupyter notebook closed, or salloc reservation times out.

For access to Jupyter notebook, copy one of the last 2 lines of Jupyter output found in terminal 1 and paste into a browser on your local machine. Looks something like this:

 http://localhost:8888/?token=f9dd18240516658e7db98108eb2d40a4d96fb2a9331d6a65

Jupyter should now be working. Once you have opened a JupyterLab window in your browser, be sure to select the R file in the 3rd Row header called "Other" to begin writing your R script.

For help using JupyerLabs, check out this documentation.

If problems arose when trying to install the r packages and kernel into your Jupyerlab environment, additional help can be found here.

Setting up a Jupyter notebook from the supercomputer is a process, and many problems can arise on the way. If you are struggling with the set up and implementation, be sure to read our more in depth documentation about it found here in our python documentation. The Jupyter section is found towards the bottom of the page.

A Note About RStudio

It is typical for those writing in R to use RStudio. When using an anaconda environment to write R, however, it is not recommended. This is because RStudio does not support the R interpreter conda runs in it's environment. It is recommended to instead use JupyterLab as the IDE (see the JupyterLab section of our python documentation, found about halfway down the page).