BYU

Office of Research Computing

Python

Python
Python is a dynamic language, filling a similar role to Perl or other scripting languages.

Python is well suited to do preparation and post-processing for jobs, but is not as suited for high-performance computing as a compiled language like C or Fortran. We do not recommend that users use it for a major component of their HPC jobs. We do recommend using it for preparation, post-processing, and proof-of-concept work for defining algorithms.

Anaconda / Miniconda3 / conda / mamba

The best way to get a particular Python version and associated libraries is to use the conda/mamba package manager from Anaconda (https://docs.conda.io/en/latest/miniconda.html). conda/mamba is already installed on the supercomputer and can be loaded with

module load miniconda3/latest

The first time you load this module, you will need to run conda init and then mamba init. This will create or modify a file called .bashrc with all of the environment changes conda and mamba need to run. To finish the initial setup, you will need to create a file called .bash_profile with at least this line:

[[ -f ~/.bashrc ]] && source ~/.bashrc

(We highly recommend using mamba instead of conda. mamba is a drop-in replacement for conda and does almost everything conda does, but faster. By also running mamba init, you will be able to use mamba.)

To finish, log out and log back in. Your command prompt should now include {base}. This signifies that conda/mamba is available and that you have the base environment activated. In the future, you will not need to load the miniconda3/module.

The base environment is for packages that conda and mamba need and should not usually be used for most computing tasks. However, you can use conda/mamba to create a conda environment and install the packages you need there.

To create a new environment, run

mamba create --name your_new_environment_name

Activate the environment with

mamba activate your_new_environment_name

Your prompt will change from {base} to {your_new_environment_name} to show that you have changed environments.

Now, install Python:

mamba install python

or

mamba install python=X.Y.Z

This will download python (or python version X.Y.Z) and install it in the current environment. You can install other packages in the same way.

For more details, refer to the miniconda documentation: https://docs.conda.io/en/latest/miniconda.html. You may, for example, want to use different repositories, called channels, to download different or newer software.

Slurm Batch Scripts and Conda

To use a conda environment inside of a Slurm batch script, modify the first line of the script so that it looks like this:

#!/bin/bash --login

After this change, you will be able to activate a conda environment inside of the script.

You will have to make this change because conda has a peculiar dependency on ~/.bashrc. Typically, a Slurm script runs in non-interactive mode, which means that bash will not run ~/.bash_profile. By including the flag --login, you instruct bash to run ~/.bash_profile anyway which, if you have set up your ~/.bash_profile and ~/.bashrc as described above, will properly set up access to conda.

Bioconda

Bioconda (https://bioconda.github.io/) is an anaconda repository of "software packages related to biomedical research". Essentially, Bioconda provides an anaconda channel, or repository, of biomedical software packages. If you are using Bioconda-provided software, you will need to make a few changes to your ~/.condarc configuration file (the file that contains settings for conda) to gain access to their channels.

Bioconda's website (https://bioconda.github.io/) specifies that you will need to run these commands in this order:

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict

Alternatively, you can edit your ~/.condarc file so that it resembles the following:

channels:
  - conda-forge
  - bioconda
  - defaults
channel_priority: strict

Your ~/.condarc may also include other settings and channels. If you have other channels, make sure that the relative ordering of the above three channels stays the same.

Once your ~/.condarc contains this information, you will be able to create an environment and install Bioconda packages and their recommended dependencies.

Available Versions

Using modules we have several full-featured versions available. A current list of Python versions (and other software) can be found by running this command:

module avail python

To load Python using modules:

# load Python 3.7.x
module load python/3.7

# switch to Python 3.6.x
module swap python/3.6

Libraries

If you need to install a library that you anticipate being the only user of, we recommend that it be installed in your own directory somewhere.

With pip you can do this by running the following command after you have loaded the relevant environment module for Python (eg python/3/6):

pip install --user package

(where package is the name of the package you want.)

You can also install everything relative to an alternate prefix directory, such as one within a group directory:

pip install --prefix=directory/to/install/to package

(where directory/to/install/to is the directory you want.)

For example if you have a top level directory: ~/fsl_groups/fslg_somegroup/.local you can use pip to install packages to ~/fsl_groups/fslg_somegroup/.local/bin and ~/fsl_groups/fslg_somegroup/.local/lib. You will then need to add the following two lines to your ~/.bash_profile (or ~/.bashrc, ~/.profile, etc.):

export PATH="$HOME/fsl_groups/fslg_somegroup/.local/bin:$PATH"
export PYTHONPATH="$HOME/fsl_groups/fslg_somegroup/.local/lib/python3.7/site-packages"

Note: replace python3.7 with the appropriate version number for Python, like python3.6.

You can run pip help install for more options and help.

If the project does not use pip, you can tell Python to use your home directory:

python setup.py install --user package

If it is a library that a large set of users will use or if it requires some specialized compiling, please open a support ticket.

The following packages are available for each version of Python in the module list:

Resolving Environment Issues

While Python does include some of the best libraries for many projects, keeping all of these libraries straight can be tricky. On the supercomputer, you have many different options to manage Python:

  1. Use the system-level python3 and install packages using the system-level pip3.
  2. Load a Python module (python/2.7, python/3.8, etc.) and install packages using that module's pip/pip3.
  3. Install Python using the system's conda/mamba module (miniconda3/latest) and use conda/mamba or the environment's pip/pip3 to install packages to that particular environment.

Unfortunately, we have found that mixing and matching the above options can return unexpected results. When an Anaconda environment is activated, for example, the PATH variable will be edited so that packages installed using the pip3 from the module python/3.8 are no longer accessible.

We strongly recommend sticking to only one of these options. Of the above, conda/mamba seems to be the most robust: you can create environments for various projects, and conda/mamba ensures that all dependencies are correctly handled.

JupyterLab

Jupyter

Consider using [JupyterLabs with OnDemand | https://rc.byu.edu/wiki/index.php?page=Custom+Kernels+using+JupyterLab+OnDemand]

To follow these instructions, you will need to be on a Unix-like operating system (macOS, Linux, etc.), not Windows. We've had some success with Windows Subsystem for Linux. Before continuing, we suggest setting up SSH multiplexing and setting up public-key authentication within the cluster.

These instructions assume you have access to the miniconda3/latest Anaconda installation as described above.

Jupyter works by connecting to your browser. To enable this, you'll need to run some commands from both your computer and our systems.

Key
-----
#                - comment
>                - command issued on your local machine
$                - command issued on login node
user             - zdawg (family pup named Zorro)
JupyterPort      - tcp port returned by jupyter command
ComputeNodeName  - compute node reserved by salloc command

In Terminal #1:

# Connect to one of the login nodes:
> ssh user@ssh.rc.byu.edu

# Interactive SLURM request - 1 cpu, 1GB of memory, 1 hour
$ salloc --nodes=1 --ntasks=1 --mem-per-cpu=1G --time=1:00:00

# Transitioned to a compute node, note compute node name found on the last line for use in ssh command in terminal #2.

# Activate the jupyterlab environment
$ mamba activate jupyterlab

# Start Juptyer
$ jupyter-lab --no-browser

# Jupyter will select a port starting with 8888, and if already in use, count up from there.
# Note the port in the output for use in ssh command in terminal #2.
# Use in place of JupyterPort below.

In Terminal #2:

# On your local computer
> ssh -N -J user@ssh.rc.byu.edu -L JupyterPort:localhost:JupyterPort user@ComputeNodeName

# Password/verification required for the login node, then password only for the compute node.
# The command will appear to hang: due to -N option, no remote command is executed, option is useful for just forwarding ports.

Example output terminal #1:

$ salloc --mem-per-cpu=1G --ntasks=1 --nodes=1 --time=1:00:00
salloc: Pending job allocation 39875987
salloc: job 39875987 queued and waiting for resources
salloc: job 39875987 has been allocated resources
salloc: Granted job allocation 39875987
salloc: Waiting for resource configuration
salloc: Nodes m8-17-2 are ready for job

$ mamba activate jupyterlab

$ jupyter-lab --no-browser
[I 17:27:14.506 NotebookApp] Serving notebooks from local directory: /zhome/zdawg
[I 17:27:14.506 NotebookApp] Jupyter Notebook 6.2.0 is running at:
[I 17:27:14.506 NotebookApp] http://localhost:8888/?token=f9dd18240516658e7db98108eb2d40a4d96fb2a9331d6a65
[I 17:27:14.506 NotebookApp]  or http://127.0.0.1:8888/?token=f9dd18240516658e7db98108eb2d40a4d96fb2a9331d6a65
[I 17:27:14.506 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 17:27:14.513 NotebookApp] 

    To access the notebook, open this file in a browser:
         file:///zhome/zdawg/.local/share/jupyter/runtime/nbserver-86703-open.html
    Or copy and paste one of these URLs:
         http://localhost:8888/?token=f9dd18240516658e7db98108eb2d40a4d96fb2a9331d6a65
      or http://127.0.0.1:8888/?token=f9dd18240516658e7db98108eb2d40a4d96fb2a9331d6a65

Example output terminal #2:

> ssh -N -J zdawg@ssh.rc.byu.edu -L 8888:localhost:8888 zdawg@m8-17-2

# Longer alternative also works:
#> ssh -N -J zdawg@ssh.rc.byu.edu -L 8888:localhost:8888 zdawg@m8-17-2.rc.byu.edu

All network and system usage is subject to monitoring and recording in order to
maintain confidentiality, data integrity, and system availability. Any improper
or unlawful use may be disclosed to organization and law enforcement officials,
and appropriate action may be taken.

Password: 
Verification code: 
The authenticity of host "'m8-17-2 (<no hostip for proxy command>)' can't be established.
ECDSA key fingerprint is SHA256:79ta2j/EcjQO7oM7kwSVFkED9YjxovQ8knzdVqA03+E.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'm8-17-2' (ECDSA) to the list of known hosts."
zdawg@m8-17-2s password: 

# hangs, wait until ...
#     ssh terminated by ctrl-c, or Jupyter notebook closed, or salloc reservation times out.

For access to Jupyter notebook, copy one of the last 2 lines of Jupyter output found in terminal 1 and paste into a browser on your local machine. Looks something like this: http://localhost:8888/?token=f9dd18240516658e7db98108eb2d40a4d96fb2a9331d6a65

Voila! Jupyter should be working. When you're done, be sure to kill Juptyer and log out of both terminals to relinquish resources back to Slurm.

This installation of JupyterLab is prepared to use any of your Conda environments as the runtime kernel. To use a Conda environment as the runtime kernel, simply activate the environment and install either ipykernel for a Python kernel or r-irkernel for an R kernel.

Jupyter Notebooks (OLD INSTRUCTIONS)

Jupyter

To follow these instructions, you will need to be on a Unix-like operating system (macOS, Linux, etc.), not Windows. We've had some success with Windows Subsystem for Linux. Before continuing, we suggest setting up SSH multiplexing and setting up public-key authentication within the cluster.

To use Jupyter notebooks on our systems, you'll want to install Jupyter via pip, as detailed above:

pip install --user jupyter     # Base Juptyer system
pip install --user jupyterlab  # Includes updated user interface

Pip should install Jupyter into a ~/.local; you may need to adjust your PATH environment variable to include Jupyter executables.

Jupyter works by connecting to your browser. To facillitate this, you'll need to run some commands from both your computer and our systems.

Key
-----
#                - comment
>                - command issued on your local machine
$                - command issued on login node
user             - zdawg (family pup named Zorro)
JupyterPort      - tcp port returned by jupyter command
ComputeNodeName  - compute node reserved by salloc command

In Terminal #1:

# Connect to one of the login nodes:
> ssh user@ssh.rc.byu.edu

# Load a Python module
$ module load python/3.6    # python/3.7, python/3.8 should also work, not tested

# Install Jupyter, only needed once
$ pip install --user jupyter         # Base Juptyer system
$ pip install --user jupyterlab    # Includes updated user interface

# Update PATH environment variable so the system knows where to find Jupyter
$ export PATH=~/.local/bin:$PATH

# Make jupyter persistent for future logins by adding the following 2 lines to ~/.bash_profile, only needed once
# module load python/3.6               # or python/3.7, python/3.8
# export PATH=~/local/bin:$PATH

# Interactive SLURM request - 1 cpu, 1GB of memory, 1 hour
$ salloc --nodes=1 --ntasks=1 --mem-per-cpu=1G --time=1:00:00

# Transitioned to a compute node, note compute node name found on the last line for use in ssh command in terminal #2.

# Start Juptyer
$ jupyter notebook --no-browser

# Jupyter will select a port starting with 8888, and if already in use, count up from there.
# Note the port in the output for use in ssh command in terminal #2.
# Use in place of JupyterPort below.

In Terminal #2:

# On your local computer
> ssh -N -J user@ssh.rc.byu.edu -L JupyterPort:localhost:JupyterPort user@ComputeNodeName

# Password/verification required for the login node, then password only for the compute node.
# The command will appear to hang: due to -N option, no remote command is executed, option is useful for just forwarding ports.

Example output terminal #1:

$ salloc --mem-per-cpu=1G --ntasks=1 --nodes=1 --time=1:00:00
salloc: Pending job allocation 39875987
salloc: job 39875987 queued and waiting for resources
salloc: job 39875987 has been allocated resources
salloc: Granted job allocation 39875987
salloc: Waiting for resource configuration
salloc: Nodes m8-17-2 are ready for job

$ jupyter notebook --no-browser
[I 17:27:14.506 NotebookApp] Serving notebooks from local directory: /zhome/zdawg
[I 17:27:14.506 NotebookApp] Jupyter Notebook 6.2.0 is running at:
[I 17:27:14.506 NotebookApp] http://localhost:8888/?token=f9dd18240516658e7db98108eb2d40a4d96fb2a9331d6a65
[I 17:27:14.506 NotebookApp]  or http://127.0.0.1:8888/?token=f9dd18240516658e7db98108eb2d40a4d96fb2a9331d6a65
[I 17:27:14.506 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 17:27:14.513 NotebookApp] 

    To access the notebook, open this file in a browser:
         file:///zhome/zdawg/.local/share/jupyter/runtime/nbserver-86703-open.html
    Or copy and paste one of these URLs:
         http://localhost:8888/?token=f9dd18240516658e7db98108eb2d40a4d96fb2a9331d6a65
      or http://127.0.0.1:8888/?token=f9dd18240516658e7db98108eb2d40a4d96fb2a9331d6a65

Example output terminal #2:

> ssh -N -J zdawg@ssh.rc.byu.edu -L 8888:localhost:8888 zdawg@m8-17-2

# Longer alternative also works:
#> ssh -N -J zdawg@ssh.rc.byu.edu -L 8888:localhost:8888 zdawg@m8-17-2.rc.byu.edu

All network and system usage is subject to monitoring and recording in order to
maintain confidentiality, data integrity, and system availability. Any improper
or unlawful use may be disclosed to organization and law enforcement officials,
and appropriate action may be taken.

Password: 
Verification code: 
The authenticity of host "'m8-17-2 (<no hostip for proxy command>)' can't be established.
ECDSA key fingerprint is SHA256:79ta2j/EcjQO7oM7kwSVFkED9YjxovQ8knzdVqA03+E.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'm8-17-2' (ECDSA) to the list of known hosts."
zdawg@m8-17-2s password: 

# hangs, wait until ...
#     ssh terminated by ctrl-c, or Jupyter notebook closed, or salloc reservation times out.

For access to Jupyter notebook, copy one of the last 2 lines of Jupyter output found in terminal 1 and paste into a browser on your local machine. Looks something like this: http://localhost:8888/?token=f9dd18240516658e7db98108eb2d40a4d96fb2a9331d6a65

Voila! Jupyter should be working. When you're done, be sure to kill Juptyer and log out of both terminals to relinquish resources back to Slurm.