Building GCC compilers on an HPC cluster
2020-05-20
Background
The GNU Compiler Collection (GCC) is a leading open-source compiler
suite that includes C, C++, Fortran and other languages. GCC is not to
be confused with gcc
, which is the GNU C compiler, one of the
components in GCC. On the Linux system of the cluster, the built-in GCC
version may be too old (e.g. on CentOS 6, this is GCC 4.4.7) for
building (compiling) some applications that requires the language
features not yet supported by that version of GCC. To build these
applications on an HPC cluster, such as UCLA Hoffman2 cluster, it is
necessary to install a newer version of GCC on the system. A common
misconception is to assume that installing a new GCC requires Linux
system administrator privilege. This is false. In fact, a regular user
can install virtually any version of GCC available, or most open-source
application programs, under the home directory independently using
existing tools available on the cluster. In this technical note, we
discuss the process of installing GCC 9. We note by passing that the
process of installing other versions of GCC is similar, if not
identical. The process of installing many other GNU software packages is
also largely the same.
GCC Evolution
Starting from GCC 5, GCC uses 2-number version naming convention (e.g. 9.3), in contrast to the 3-number version number used in prior releases (e.g. GCC 4.4.7). In the new 2-number versioning system, the first number is the major version number, implying that new features are introduced into that release. The second number is the minor version, denoting bug fixes that correct the reported issues introduced in the initial major release. We note that, while the main documentation now uses the post-GCC-5 two-number versioning system, when downloading the GCC source for compiling, as discussed in later sections, a third number is still present in the file name (e.g. “0” in “gcc-9.3.0.tar.gz”).
Here are some examples of the new features introduced in GCC major releases over the years:
C++11 is added in and after GCC 4.8.4.
In GCC 5, the default mode for C is changed to
-std=gnu11
instead of-std=gnu89
.In GCC 6, the default mode for C++ is changed to
-std=gnu++14
instead ofgnu++98
.In GCC 7, a number of C++ language related changes, e.g. enforcing stricter rules when using templates, and the implementation of most of OpenACC 2.0a specification is added.
In GCC 8, the C and C++ compilers can emit more fix-it hints.
In GCC 9, there are further improvements of command line options and diagnostics information, and the implements most of OpenACC 2.5 specification is added.
In GCC 10, the implement most of OpenACC 2.6 specification is added.
More details about the changes in each release are available online. For example, https://gcc.gnu.org/gcc-9/changes.html are the new changes for GCC 9.
Sometimes the build process needs to be modified (e.g. using alternative compiler options) if the application assumes certain default behavior of GCC that is different from that of the version being used.
Building GCC
Download the source
Different versions of GCC can be freely downloaded from one of the GNU
mirror sites, listed in https://gcc.gnu.org/releases.html
. Here we use
the wget
command to download GCC 9.3 from the mirror site
https://mirrors.kernel.org/gnu/gcc
into the current directory, such as
your scratch directory:
$ wget https://mirrors.kernel.org/gnu/gcc/gcc-9.3.0/gcc-9.3.0.tar.gz
The downloaded .tar.gz
is a compressed file. It needs to be
uncompressed and expanded before use, described in the next section.
Decompress the .tar.gz file
We can use the tar
command to decompress (a.k.a. “un-gzip”) and to
expand (a.k.a. “untar”) the tar file in one go:
$ tar xvfz gcc-9.3.0.tar.gz
$ cd gcc-9.3.0
Several arguments have been passed to the tar
command:
“x”: expand the tar file back to its original file structure (i.e. individual files)
“f”: use the file
“v”: display the progress on screen
“z”: decompress the gzip’d
.gz
file
See https://www.gnu.org/software/tar/manual/ for more details about
tar
.
In the gcc-9.3.0
directory, there should be many files, including
configure
, which is the script we will use next, as well as several
documentation files such as README
and NEWS
.
Download the prerequisites
Building GCC requires a number of external libraries (e.g.
GMP). They need to be installed prior to
installing GCC. Care should be taken about the versions of these
libraries because GCC requires matching versions (or ranges of versions)
of these libraries; incompatible external libraries result in a broken
GCC build. In the gcc-9.3.0/
directory from the last step, this script
is run to download the required external libraries:
$ ./contrib/download_prerequisites
The screen output should look similar to:
2020-05.../pub/gcc/infrastructure/gmp-6.1.0.tar.bz2 [2383840] -> "./gmp-6.1.0.tar.bz2" [1]
2020-05.../pub/gcc/infrastructure/mpfr-3.1.4.tar.bz2 [1279284] -> "./mpfr-3.1.4.tar.bz2" [1]
2020-05.../pub/gcc/infrastructure/mpc-1.0.3.tar.gz [669925] -> "./mpc-1.0.3.tar.gz" [1]
2020-05.../pub/gcc/infrastructure/isl-0.18.tar.bz2 [1658291] -> "./isl-0.18.tar.bz2" [1]
gmp-6.1.0.tar.bz2: OK
mpfr-3.1.4.tar.bz2: OK
mpc-1.0.3.tar.gz: OK
isl-0.18.tar.bz2: OK
All prerequisites downloaded successfully.
A word about where to run the build. After downloading all of the files, we can proceed to configure and build GCC, described in the next sections. These steps are CPU-intensive. It is highly recommended to launch an interactive session (via the job scheduler) to perform the operations on a compute node, as the login nodes may be too loaded or memory-restricted for these tasks.
Configure
Before building (compiling) GCC, we need to configure it by passing several parameters to the configure script and also let the configure script to detect certain machine specific parameters of the cluster.
One requirement of GCC is that one cannot run the configure script
directly in the same directory where the configure
is located. We need
to create a separate, empty directory and run the configure script from
the new directory. These steps can be done by the following commands:
$ cd .. # go up one level from gcc-9.3.0
$ mkdir gcc_build
Before proceeding to the next step, let’s confirm that you have the two directories at the same level:
$ ls
gcc-9.3.0 gcc-9.3.0.tar.gz gcc_build
The gcc-9.3.0
is the result of expanding gcc-9.3.0.tar.gz
, and
gcc_build
is the new empty directory where we will run the configure
script. These two directories may be deleted after GCC is successfully
built and installed.
Enter the gcc_build
directory and run the configure script (located in
another directory, gcc-9.3.0
), assuming that you will install GCC to
your directory $HOME/sw/gcc/9.3.0
:
$ cd gcc_build
$ ../gcc-9.3.0/configure \
--prefix=$HOME/sw/gcc/9.3.0 \
--disable-multilib \
--enable-languages=c,c++,fortran,jit \
--enable-checking=release \
--enable-host-shared
The meanings of the parameters are:
--prefix
: the directory where GCC will be installed--disable-multilib
: we will build only the 64-bit version of GCC.--enable-languages
: only the compilers of the specified languages will be built; GCC contains more other languages.--enable-checking
: Enable additional checkings for stage1 of compiler--enable-host-shared
: Build host code as shared library; this is needed for jit.
See https://gcc.gnu.org/install/configure.html for more information about configuring GCC.
Compile
After running configure, we are ready to build GCC. We are going to build GCC using the system default compiler, e.g.
$ which gcc
/usr/bin/gcc
$ gcc --version
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23)
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
To build GCC, simply issue the make
command in the
[gcc_build]{.title-ref} directory (where we just ran the configure
command in the previous section). We add -j 4
to enable parallel build
(using multiple CPU cores) to accelerate the build process:
$ make -j 4
The make
step takes a while to complete. If the make
step is
successful, make install
will install GCC to the directory specified
by --prefix
in the configure step.
Test
It is a good idea to run GCC tests to see if it is correctly built.
Running tests takes a while to complete. To run GCC tests, run the
following command in the build directory (e.g. gcc_build
from previous
sections):
$ make -k check
Install
From the build directory (e.g. gcc_build
), run this command to install
GCC to the target directory, specified by --prefix
in the configure
step:
$ make install
Using GCC
To use the newly installed GCC, the corresponding file system paths need to be added to the environment variables:
install_dir=$HOME/sw/gcc/9.3.0 # GCC install directory
export PATH=$install_dir/bin:$PATH
export LD_LIBRARY_PATH=$install_dir/lib64:\
$install_dir/lib:$GCC_DIR/lib/gcc/x86_64-pc-linux-gnu/9.3.0:\
$install_dir/libexec/gcc/x86_64-pc-linux-gnu/9.3.0:\
$LD_LIBRARY_PATH
unset install_dir
The temporary variable install_dir
corresponds to the the path set by
--prefix
in the configure step, discussed previously. After using its
value in PATH
and LD_LIBRARY_PATH
, it can be removed, or “unset”
as shown above. The purpose of the PATH
environment variable is for
the GCC commands (e.g. typing gcc
) to be found automatically without a
full path. The purpose of the LD_LIBRARY_PATH
environment variable is
for the shared libraries associated with GCC be found at run time. To
make $PATH
and $LD_LIBRARY_PATH
permanent, this block can be added
to ~/.bashrc
or ~/.bash_profile
.
The Linux operating system uses the file system paths in PATH
and
LD_LIBRARY_PATH
in the left-to-right order. We added the newly
installed GCC at the beginning of these environment variables, so they
are found (and used) in the current shell, even if there are paths to
other versions of GCC later in the paths.
Summary
This technical note summarizes the essential steps of installing GCC compilers within a regular user’s own directory without requiring superuser (or system administrator) privilege in an HPC cluster environment, such as UCLA Hoffman2 cluster. The procedure is expected to be the same, except for the version numbers, for similar GCC versions, at least those released in recent past or in near future.
Appendix
The script to run the entire procedure described in this note is available at:
https://gist.github.com/schuang/5df8dd3c7c17067cdeadc09d607f7cfa