BYU

Office of Research Computing

R

R
R is a statistical computing language and software environment.

It is most well known for its plotting and statistical computing abilities, including statistical modeling and prediction. It is not as suited for high-performance computing as a compiled language like C or Fortran would be, and struggles to perform at an optimized level when dealing with large computation and data, especially when compared to similar dynamic languages like julia or python.

We strongly recommend that users do not use R for a major component of their HPC jobs. We do recommend using it for data analysis, statistical modeling, prediction, and proof-of-concept work for defining algorithms.

Creating an R Environment through Conda/Mamba

When creating an R environment, use caution. Unlike other environments, when R packages and libraries are loaded they do not always grab the most recent versions, and can run into all sorts of compatibility and dependency issues, proving to be quite a headache.

Because of this, we recommend building using the conda/mamba package manager, and then building R inside of this environment. Conda/mamba does a great job of addressing these issues behind the scenes, and works well with R. For instructions on setting up conda/mamba environments, please see our Conda Environments page. Your command to install R might look like this:

mamba install r <r-example> <r-packages>

Common R Environment Issues and Solutions

As mentioned before, building an R environment can prove to be difficult depending on the libraries and packages desired.

When running into package dependency and build errors, make sure everything is up to date. You can do this in one swift command by typing:

mamba update r-caret

On the command line while inside of the R-environment. r-caret is what conda calls its R interpreter, so updating r-caret updates everything related to R, including dependencies and file paths, in one command. Additionally, you could launch R inside of the environment by typing "R" and running the command update.packages() , though we recommend trying the mamba command first.

If the package dependency and build errors still aren't resolved, we recommend creating a new environment and installing the most important packages (like tidyverse) through mamba commands to your build first before installing smaller packages. Also, make sure you are using an up-to-date version of R in the conda/mamba environment with the command R --version.

Installing Libraries (Including Tidyverse):

Tidyverse is largely the most useful library within R and brings with it a variety of quality of life packages including ggplot2 and dplyr. At one point tidyverse was how piping was implemented in R. It is strongly recommend you use Tidyverse when coding in R.

In R, packages are called libraries. When running R in a conda environment, it is recommended you install and update packages from the command line, with the command mamba install <r-package_name>. You can also install libraries within your R script by typing install.packages("<package_name>"), though we recommend installing through mamba, as it typically works out dependencies and compatibility issues on its own.

For libraries not included in Anaconda's channels, we can run the command:

R -e '<options>; install.packages("package_name", repo="https://cloud.r-project.org/");'

to install the desired library from the command line while in your designated conda environment. The <options> piece of the command is not needed and only used if you want to designate a parameter like ncpus=8, and the repo= should point to the url link the package is located, with the "http://cloud.r-project.org/" being the most common repo used.

Using R with Bash

To run R scripts and commands from the bash shell, simply use the command Rscript.

For example, if you created a job script and would like to run an R script called example.r within the job, the tail end of your script would look something like:

# LOAD MODULES, INSERT CODE, AND RUN YOUR PROGRAMS HERE
Rscript example.r <optional_args>

This same Rscript example.r <optional_args> command can also be used to execute an R script directly from the command line. It's worth noting the <optional_args> can include inputs that can be fed to and implemented in your R script. Click here for more tips and tricks related to running R from Bash.

R in JupyterLab

Using Jupyter as an IDE can be beneficial for a number of reasons, as it is intended to make the life of a programmer easier. Jupyter's GUI means graphics can be displayed, and directories and files are easy to locate and navigate to. This means too that plots and graphs generated by your R code can be displayed when the code is run in real time, which is not the case when running standard R code from a terminal window.

We strongly recommend you look at our JupyterLab on OpenOnDemand page to understand how Jupyter works with your environment. If this is your first time enabling JupyterLabs from a conda environment, you should first follow the set up instructions there. To use R in JupyterLab, you will need to install both r and r-irkernel in your conda/mamba environment:

mamba install r r-irkernel

You should then see R appear in the launcher when you connect to your JupyterLab session.

A Note About RStudio

It is typical for those writing in R to use RStudio. When using an anaconda environment to write R, however, it is not recommended. This is because RStudio does not support the R interpreter conda runs in it's environment. It is recommended to instead use JupyterLab as the IDE (see the JupyterLab section of our python documentation, found about halfway down the page).