Brief NW-GRID Tutorial - Using Compilers and Libraries

Here we provide a starter guide to developing optimised applications to run in the iDataPlex. A similar guide for the TACC LoneStar system which goes into additional detail about code tuning is available here http://www.tacc.utexas.edu/user-services/user-guides/lonestar-user-guide . Note that suggested compiler options are a guide only and may not be suitable for all applications.

1) GCC compiler suite

The RedHat-5.5 release on this system was built with the Gnu Compiler Collection 4.1.2 which is therefore the system default. In addtion we have 4.4.0 and 4.6.2 available. It is recommended to use 4.6.2 via the module gcc/4.6.2 and the OpenMPI message passing environment via module openmpi/1.4.4/gcc. It is also possible to use the MKL libraries with GCC.

For optimisation, consider using gfortran or gcc with the following flags:

 -Wl,-rpath=/gpfs/packages/gcc/4.6.2/lib -m64 -O3 -pipe -march=corei7 -msse4.2 -fno-strict-aliasing

It may be better to try "-march=native" which on this architecture automatically selects:

 -march=corei7 -mcx16 -msahf -mno-movbe -maes -mpclmul -mpopcnt -mno-abm -mno-lwp -mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-tbm -mno-avx -msse4.2 -msse4.1 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=12288 -mtune=corei7

2) Intel compiler suite

We have deployed two releases of the Intel compiler suite via modules intel/comp/11.1 and intel/comp/12.0. The latter is recommended. The message passing interfaces openmpi/1.4.4/intel or intel/mpi/4.0.1 and others are available.

For optimisation, consider using ifort or icc with the following flags (although inter-procedural optimisation can take a long time):

-Wl,-rpath=/gpfs/packages/intel-ics/2011.0.013/lib/intel64 -m64 -xSSE4.2 -ip -O3 -no-prec-div -parallel -opt-prefetch

Of these, the -O3 and -xSSE4.2 may be the most useful. Some codes, like NWCHEM, expect to be able to use 64-bit integers which can be enabled using the "-i8" flag. -parallel will try to generate multi-threaded code and may need to link to the MKL library suite.

3) Intel MKL

MKL is the Intel Maths Kernel Library which comprises a collection of popular serial and parallel routines, some from BLAS, LAPACK and ScaLAPACK with additional FFT and others. For a full list and up to date on-line information from Intel, see http://software.intel.com/en-us/articles/intel-math-kernel-library-documentation/ .

The Intel MKL ILP64 libraries use the 64-bit integer type (necessary for indexing large arrays), and the LP64 libraries use the 32-bit integer type. Both static *.a and dynamic *.so versions are provided for most of the libraries in MKL.

We will from now on assume that we are statically linking an application with 32-bit integers. The basic library layers to use are as follows.

In addition to the Intel compiler there are versions for GCC and PGI.

It is useful to define the path to the set of libraries we are going to use. This will depend on the architecture of the machine. on SID (an x86_64 architecture with Xeon E5620 processors) it is

load module intel/comp/11.1
load module intel/mkl/10.2
load module openmpi/1.4.4/intel
export MKLPATH=/gpfs/packages/intel/mkl/10.2.2.025/lib/em64t/
export MKLINCLUDE=/gpfs/packages/intel/mkl/10.2.2.025/include

If linking dynamically, you can export LD_LIBRARY_PATH=$MKLPATH.

Here are some recipes for linking with different kinds of applications, assuming static linking. We also assume the Intel compiler and OpenMPI.

BLAS and LAPACK

libmkl_intel_lp64.a libmkl_intel_thread.a libmkl_core.a

ScaLAPACK

libmkl_intel_lp64.a libmkl_scalapack_lp64.a libmkl_blacs_openmpi_lp64.a libmkl_intel_thread.a libmkl_core.a

You can also add the lapack95 and blas95 interface components if required. A couple of additional libraries may be required, namely -liomp5 and -lpthread. So, as a summary, here is a link line for a program using ScaLAPACK with threading enabled.

If you don't want threading in the MKL calls, you should use libmkl_sequential.a and omit -liomp5.

mpif90 -o myprog -O3 myprog.f90 -I$MKLINCLUDE -L$MKLPATH -Wl,--start-group \
       $MKLPATH/libmkl_intel_lp64.a $MKLPATH/libmkl_scalapack_lp64.a \
       $MKLPATH/libmkl_blacs_openmpi_lp64.a $MKLPATH/libmkl_intel_thread.a \
       $MKLPATH/libmkl_core.a -Wl,--end-group \
       -liomp5 -lpthread

Note the -Wl options are required for the Intel layered model of static linking the executable. See the user guide for more information.

If you want to use 64-bit integers, you will need the -i8 compiler option and alternative libraries.

For MKL-10.3 there are some minor differences

load module intel/comp/12.0
load module intel/mkl/10.3
load module openmpi/1.4.4/intel
export MKLPATH=/gpfs/packages/intel-ics/2011.0.013/mkl/lib/intel64
export MKLINCLUDE=/gpfs/packages/intel-ics/2011.0.013/mkl/include
export INTELPATH=/gpfs/packages/intel-ics/2011.0.013/lib/intel64

Which gives us:

mpif90 -o myprog -O3 myprog.f90 -I$MKLINCLUDE -L$MKLPATH -Wl,--start-group \
       $MKLPATH/libmkl_intel_lp64.a $MKLPATH/libmkl_scalapack_lp64.a \
       $MKLPATH/libmkl_blacs_openmpi_lp64.a $MKLPATH/libmkl_intel_thread.a \
       $MKLPATH/libmkl_core.a -Wl,--end-group \
       -L$INTELPATH -liomp5 -lpthread

Finally, for dynamic linking, you could do as follows:

mpif90 -o myprog -O3 myprog.f90 -I$MKLINCLUDE -L$MKLPATH \
       -lmkl_intel_lp64 -lmkl_scalapack_lp64 \
       -lmkl_blacs_openmpi_lp64 -lmkl_intel_thread -lmkl_core \
       -L$INTELPATH -liomp5 -lpthread

4) BLAS, PBLAS, BLACS, LAPACK and ScaLAPACK

LAPACK and ScaLAPACK guides are included here http://software.intel.com/en-us/articles/intel-math-kernel-library-documentation/ .

We have compiled up the basic libraries (scalapack-2.0.1 and lapack-3.4.0 with reference blas) from source and installed them with modules scalapack/2.0.1/gcc and scalapack/2.0.1/intel .

5) PGI compiler suite

We have installed PGI compiler suite via module pgi/comp/11.3 and openmpi/1.4.4/pgi. PGI-10.9 is also available.

Try pgcc or pgf90 with the following optimisations of which -O3 is the most important.

 -O3 -tp=nehalem-64 -fastsse -Minfo=all -Mscalarsse -Mvect=prefetch,sse -Mipa

Dev_Notes (last edited 2012-05-16 09:40:51 by RobAllan)

This website maintained by Research Computing Services, University of Manchester