BYU

Office of Research Computing

Levi's RUN_PATH article

Dynamically linking software on a GNU/Linux system in 2017

At work I help manage a multi-version software repository for our users. Our users inevitably need different versions of core libraries and applications. We manage this by installing software into different paths and then using environment modules to let our users choose the versions of things they need. For instance we may install the GNU Compiler Collection version 6.4.0 to /apps/gcc/6.4.0 and version 7.2.0 to /apps/gcc/7.2.0. Users who want GCC 6.4 will run module load gcc/6.4 and users who want GCC 7.2 will run module load gcc/7.2. What does this have to do with linking software?

Let's say I've compiled Python 3.6 with GCC 6.4. The python3 binary will likely depend on libraries provided by GCC such as libgcc_s.so. How does the linker know to use the versions provided by GCC 6.4 instead of the ones provided by the operating system? On GNU/Linux systems the slightly simplified process looks like this:

  1. Check the DT_RPATH section of the binary if and only if the DT_RUNPATH attribute does not exist. I will refer to DT_RPATH this as RPATH.
  2. If the excutable is not a setuid/setgid binary then look in LD_LIBRARY_PATH.
  3. Check the DT_RUNPATH section of the binary. I will refer to this as RUNPATH.
  4. Look up the library in the ldconfig cache file (commonly /etc/ld.so.cache).
  5. Look in the system paths /lib, /usr/lib and /usr/local/lib (or lib64 variants)

If you are not familiar with the idea of a search path: put each directory that you want to be searched into a colon separated list. As an example LD_LIBRARY_PATH might look like /tmp/library_a/lib:/tmp/library_b/lib. If we are at step 2 in the above process then the dynamic linker will first check /tmp/library_a/lib and if it does not find it then it will check /tmp/library_b/lib.

How do we know which technique to use? My goal for this article is to explain when each option might be used.

Note that the concepts may carry to non-GNU/Linux systems but the details will be different. This article focuses on common GNU/Linux systems.

When to use RPATH

Use RPATH for when the paths to search must not be overriden for any reason both now and in the future. Let's say you are installing software for multiple users and do not want them to be able to goof up running this particular piece of software by setting LD_LIBRARY_PATH. You might consider RPATH in this case. I say might because once this is set there is no overriding it without altering the binary.

I generally recommend against using RPATH because it is very hard to tell the future. How certain can you be when you are building the software that will neither you nor your users will never have a valid reason to override this path?

It is worth noting that RPATH is deprecated. However, I suspect it will never permanently go away for two reasons:

  1. Linux systems really, really try not to break userspace software.
  2. It actually serves a need that no other technique can fill. If rpath went away the first thing that would be checked is LD_LIBRARY_PATH. We cannot (and should not even if we could) prevent users from altering LD_LIBRARY_PATH. However, if the user has a library in LD_LIBRARY_PATH that conflicts with the one administrator-provided binary or library needs then it may not run correctly. The rpath provides a mechanism for administrators to prevent this from happening.

When to use LD_LIBRARY_PATH

Use of LD_LIBRARY_PATH is highly discouraged. Here is a small sampling of articles that recommend against its use:

I encourage reading more on this subject but if you want the very quick, probably too-simple summary it's that 1) conflicts are inevitable and 2) it is a potential security issue.

If you do use LD_LIBRARY_PATH it should be for temporary use such as running a test suite with a different version of a dependency to ensure it works correctly. You should be very wary setting an LD_LIBRAY_PATH in a modulefile or some other file that might get sourced by other users; doing so is a sign that what you are doing is permanent and some other method should be used.

A cautionary tale

At work we used the LD_LIBRARY_PATH path method for hundreds of software packages simply because we did not know any better at the time. Years later when we realized our mistake we were well past the point of recompiling them properly to use different, more appropriate methods. This meant that for new software we were basically forced to use RPATH which in turn prevents users from using LD_LIBRARY_PATH as it was intended.

Fortunately for us we are preparing a new operating system image. This is the impetus for writing this article: I need a single article that explains the process in enough depth to understand the issues and can confidently recommend to my coworkers. Hopefully it will be useful to others as well.

When to use RUNPATH

Most user-space software should prefer using RUNPATH. This is the intended replacement for RPATH and in 2017 most Linux operating systems provide a linker that will understand it. In particular if you ship pre-built software to be installed on systems you do not own you should probably use this method with a RUNPATH that mentions $ORIGIN. The use of $ORIGIN allows you to use relative search paths. Let's say your software is extracted into $DIR; your directory structure might be something like:

$DIR/
    bin/
    lib/

Your executables should be in $DIR/bin and should have RUNPATH set to include $ORIGIN/../lib. You put all of the libraries in $DIR/lib and if they have dependencies on other libraries you are also shipping they should have a RUNPATH of $ORIGIN.

When to use the ldconfig cache

User-space software will rarely use this method directly. Software installed by administrators that is intended for all users will typically go into the system paths but sometimes they may need it to be located elsewhere. In this case they will add a relevant file to /etc/ld.so.conf.d and regenerate the ld.so.cache.

When to use system paths

If all users of the system are intended to use this version of the library nearly all of the time then it should probably belong here. This is the simplest, easiest method because it requires no extra configuration at build-time nor at run-time. Most libraries installed by package managers end up here.

Summary

For most software that is intended to be shared by all users I recommend using the system paths. However, software authors should not assume their software will be installed here and should provide a mechanism to install it to other places.

If the system paths are not suitable because of version conflicts or you do not want all users to automatically use them then I recommend RUNPATH.

I recommend against using LD_LIBRARY_PATH. If you do use LD_LIBRARY_PATH it should be temporary such as loading a newer version of a dependency in order to run test suites against that version.

I recommend against using RPATH. Use RPATH only if you are very certain you do not want users to purposefully or accidentally load different versions of libraries through LD_LIBRARY_PATH.