Brief NW-GRID Tutorial - Using Compilers and Libraries
Here we provide a starter guide to developing optimised applications to run in the iDataPlex. A similar guide for the TACC LoneStar system which goes into additional detail about code tuning is available here http://www.tacc.utexas.edu/user-services/user-guides/lonestar-user-guide . Note that suggested compiler options are a guide only and may not be suitable for all applications.
1) GCC compiler suite
The RedHat-5.5 release on this system was built with the Gnu Compiler Collection 4.1.2 which is therefore the system default. In addtion we have 4.4.0 and 4.6.2 available. It is recommended to use 4.6.2 via the module gcc/4.6.2 and the OpenMPI message passing environment via module openmpi/1.4.4/gcc. It is also possible to use the MKL libraries with GCC.
For optimisation, consider using gfortran or gcc with the following flags:
-Wl,-rpath=/gpfs/packages/gcc/4.6.2/lib -m64 -O3 -pipe -march=corei7 -msse4.2 -fno-strict-aliasing
It may be better to try "-march=native" which on this architecture automatically selects:
-march=corei7 -mcx16 -msahf -mno-movbe -maes -mpclmul -mpopcnt -mno-abm -mno-lwp -mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-tbm -mno-avx -msse4.2 -msse4.1 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=12288 -mtune=corei7
2) Intel compiler suite
We have deployed two releases of the Intel compiler suite via modules intel/comp/11.1 and intel/comp/12.0. The latter is recommended. The message passing interfaces openmpi/1.4.4/intel or intel/mpi/4.0.1 and others are available.
For optimisation, consider using ifort or icc with the following flags (although inter-procedural optimisation can take a long time):
-Wl,-rpath=/gpfs/packages/intel-ics/2011.0.013/lib/intel64 -m64 -xSSE4.2 -ip -O3 -no-prec-div -parallel -opt-prefetch
Of these, the -O3 and -xSSE4.2 may be the most useful. Some codes, like NWCHEM, expect to be able to use 64-bit integers which can be enabled using the "-i8" flag. -parallel will try to generate multi-threaded code and may need to link to the MKL library suite.
3) Intel MKL
MKL is the Intel Maths Kernel Library which comprises a collection of popular serial and parallel routines, some from BLAS, LAPACK and ScaLAPACK with additional FFT and others. For a full list and up to date on-line information from Intel, see http://software.intel.com/en-us/articles/intel-math-kernel-library-documentation/ .
The Intel MKL ILP64 libraries use the 64-bit integer type (necessary for indexing large arrays), and the LP64 libraries use the 32-bit integer type. Both static *.a and dynamic *.so versions are provided for most of the libraries in MKL.
We will from now on assume that we are statically linking an application with 32-bit integers. The basic library layers to use are as follows.
- Interface layer - libmkl_intel_lp64.a
- Fortran 95 interface layer - libmkl_blas95_lp64.a libmkl_lapack95_lp64.a
- Threading layer libmkl_intel_threads.a or libmkl_sequential.a
- Computational layer - libmkl_core.a
- Threads runtime library - libiomp5.a or legacy library - libguide.a and libmkl_blacs.a
- Pthreads - libpthread.a is required on Unix systems even is sequential is specified
- Maths support - libm.a maybe required
In addition to the Intel compiler there are versions for GCC and PGI.
It is useful to define the path to the set of libraries we are going to use. This will depend on the architecture of the machine. on SID (an x86_64 architecture with Xeon E5620 processors) it is
load module intel/comp/11.1 load module intel/mkl/10.2 load module openmpi/1.4.4/intel export MKLPATH=/gpfs/packages/intel/mkl/10.2.2.025/lib/em64t/ export MKLINCLUDE=/gpfs/packages/intel/mkl/10.2.2.025/include
If linking dynamically, you can export LD_LIBRARY_PATH=$MKLPATH.
Here are some recipes for linking with different kinds of applications, assuming static linking. We also assume the Intel compiler and OpenMPI.
BLAS and LAPACK |
libmkl_intel_lp64.a libmkl_intel_thread.a libmkl_core.a |
ScaLAPACK |
libmkl_intel_lp64.a libmkl_scalapack_lp64.a libmkl_blacs_openmpi_lp64.a libmkl_intel_thread.a libmkl_core.a |
You can also add the lapack95 and blas95 interface components if required. A couple of additional libraries may be required, namely -liomp5 and -lpthread. So, as a summary, here is a link line for a program using ScaLAPACK with threading enabled.
If you don't want threading in the MKL calls, you should use libmkl_sequential.a and omit -liomp5.
mpif90 -o myprog -O3 myprog.f90 -I$MKLINCLUDE -L$MKLPATH -Wl,--start-group \
$MKLPATH/libmkl_intel_lp64.a $MKLPATH/libmkl_scalapack_lp64.a \
$MKLPATH/libmkl_blacs_openmpi_lp64.a $MKLPATH/libmkl_intel_thread.a \
$MKLPATH/libmkl_core.a -Wl,--end-group \
-liomp5 -lpthreadNote the -Wl options are required for the Intel layered model of static linking the executable. See the user guide for more information.
If you want to use 64-bit integers, you will need the -i8 compiler option and alternative libraries.
For MKL-10.3 there are some minor differences
load module intel/comp/12.0 load module intel/mkl/10.3 load module openmpi/1.4.4/intel export MKLPATH=/gpfs/packages/intel-ics/2011.0.013/mkl/lib/intel64 export MKLINCLUDE=/gpfs/packages/intel-ics/2011.0.013/mkl/include export INTELPATH=/gpfs/packages/intel-ics/2011.0.013/lib/intel64
Which gives us:
mpif90 -o myprog -O3 myprog.f90 -I$MKLINCLUDE -L$MKLPATH -Wl,--start-group \
$MKLPATH/libmkl_intel_lp64.a $MKLPATH/libmkl_scalapack_lp64.a \
$MKLPATH/libmkl_blacs_openmpi_lp64.a $MKLPATH/libmkl_intel_thread.a \
$MKLPATH/libmkl_core.a -Wl,--end-group \
-L$INTELPATH -liomp5 -lpthreadFinally, for dynamic linking, you could do as follows:
mpif90 -o myprog -O3 myprog.f90 -I$MKLINCLUDE -L$MKLPATH \
-lmkl_intel_lp64 -lmkl_scalapack_lp64 \
-lmkl_blacs_openmpi_lp64 -lmkl_intel_thread -lmkl_core \
-L$INTELPATH -liomp5 -lpthread
4) BLAS, PBLAS, BLACS, LAPACK and ScaLAPACK
LAPACK and ScaLAPACK guides are included here http://software.intel.com/en-us/articles/intel-math-kernel-library-documentation/ .
We have compiled up the basic libraries (scalapack-2.0.1 and lapack-3.4.0 with reference blas) from source and installed them with modules scalapack/2.0.1/gcc and scalapack/2.0.1/intel .
5) PGI compiler suite
We have installed PGI compiler suite via module pgi/comp/11.3 and openmpi/1.4.4/pgi. PGI-10.9 is also available.
Try pgcc or pgf90 with the following optimisations of which -O3 is the most important.
-O3 -tp=nehalem-64 -fastsse -Minfo=all -Mscalarsse -Mvect=prefetch,sse -Mipa