DL1 Sun Cluster at Daresbury

The Daresbury NW-GRID Sun Cluster dl1.nw-grid.ac.uk comprises three compute racks each containing 32 SUN x4100 nodes. Each node contains 2 dual core 2.4Ghz AMD Opteron processors with 8GB of memory.

This Sun cluster was purchased from Streamline Computing in 2005 and software updated by them in 2009.

To use the Daresbury DL1 cluster, first log onto the head node dl1.nw-grid.ac.uk. Most users on the Daresbury Science and Innovation Campus network will be able to use ssh, other users should use gsi-ssh and will need an e-Science Certificate, see http://www.grid-support.ac.uk . For STFC staff see also http://www.nw-grid.ac.uk/SSO .

Brief NW-GRID Tutorial - Compiling and Running simple Jobs

This explains how to run simple jobs with OpenMPI or MPICH-2.

1) Compilation

In this first section we will assume you are using OpenMPI which is the preferred option. MPICH-2 is now also available on the system, see below. On NW-GRID we are using modules. You can see the modules provided on DL1 by typing:

> module avail
----------------------------------------- /usr/share/Modules/packages ------------------------------------------
globus   pelegant sabre    visit

---------------------------------------- /usr/share/Modules/modulefiles ----------------------------------------
compiler/gnu/4.2.1    dot                   mpich/2-1.2.1/gnu     openmpi/1.3.1-1/intel
compiler/gnu/4.4.1    module-cvs            mpich/2-1.2.1/intel   openmpi/1.3.1-1/pgi
compiler/intel/11.0   module-info           mpich/2-1.2.1/pgi     openmpi/1.3.3-1/gnu
compiler/pgi/8.0-3    modules               openmpi/1.3.1-1/gnu   use.own

In the following notes we will assume you want to use openmpi 1.3.1-1 with the Intel Fortran 90 compiler. To ensure the environment is set up for this, type:

>module list
No Modulefiles Currently Loaded.

> module load openmpi/1.3.1-1/intel
> module load compiler/intel/11.0

> module list
Currently Loaded Modulefiles:
  1) compiler/intel/11.0   2) openmpi/1.3.1-1/intel

If you had any other module loaded it is preferable to unload that first.

The chosen compiler in the parallel environment is invoked using the mpif90 wrapper. This wrapper is also used for linking the appropriate libraries which are included by default. A simple parallel Fortran 90 application can thus be compiled as follows:

> mpif90 -c myprog.f90
> mpif90 -o myprog myprog.o

2) Setting up the internal network configuration

We will first assume that we will use OpenMPI over TCP/IP on the Gb/s network. Two networks are available for MPI communication and are linked to ethernet adapters eth1 and eth2. Streamline have provided a setup script to do the configuration as follows:

For eth1:

/opt/streamline/MCA-PARAMS/setup eth1

For eth2:

/opt/streamline/MCA-PARAMS/setup eth2

3) Running the application using Sun Grid Engine

Streamline Computing have provided a useful command called ompisub which creates and submits the SGE job script. It is suggested that this is used from a bash script (e.g. here called submit) as follows:

> cat submit
#! /bin/bash
EXEC="myprog arguments"
rm myprog.sh
export QSUB_OPTIONS="-V"
ompisub 2x4 $EXEC

This will attempt to run the myprog executable on two nodes with 4x processor cores per node (referred to as 4-way SMP).

The ompisub command is itself a shell script which can be found in /usr/bin/ompisub. It can be copied and edited to make changes. The myprog.sh script which it creates can also be kept and edited for subsequent runs, e.g. by adding different names for the stdout and stderr files. Please note that is you are using MPICH-2 then you should use mpich2sub instead of ompisub, see below.

If this is submitted from a directory called /panfs/dl/home/rja/mydir, it will produce:

> ./submit
Generating SGE job file for a 8 cpu mpich job with SMP=4 from
executable /panfs/dl/home/rja/mydir/myprog.
QSUB mpirun -np 8 /panfs/dl/home/rja/mydir/myprog arguments

Done.

Submitting SGE job as follows:

qsub -pe openmpi 2 /panfs/dl/home/rja/mydir/myprog.sh

Sending standard output to file: /panfs/dl/home/rja/mydir/myprog.sh.o748
Sending standard error  to file: /panfs/dl/home/rja/mydir/myprog.e748

Use the qstat command to query the job queue. e.g
qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
    748 0.00000 myprog     rja          qw    06/08/2009 15:43:03                                    2

Job submission complete.

The SGE script actually submitted to qsub is called myprog.sh and will be as follows:

> cat myprog.sh
#!/bin/sh
cd /panfs/dl/home/rja/mydir
echo =========================================================
echo "Cluster name         : dl1.nw-grid.ac.uk "
echo "Arch                 : x86_64 "
echo "SGE job submitted    : Mon Jun  8 14:39:23 BST 2009"
echo "MPIRUN_ARGS          : "
date_start=`date +%s`
echo 8 cpus on 2 nodes \( SMP=4 \)
echo Executable file: /panfs/dl/home/rja/mydir/myprog
echo Executable args: arguments
echo MPI parallel job.
echo 2 hosts used:

echo -------------
cat /panfs/dl/home/rja/.mpich/mpich_hosts.$JOB_ID  | cut -f 1 -d \. | sort  | fmt -w 30
echo =========================================================
echo Job output begins
echo -----------------
echo
#$ -cwd -V
mpirun -np 8 /panfs/dl/home/rja/mydir/myprog arguments
echo
echo ---------------
echo Job output ends
date_end=`date +%s`
seconds=$((date_end-date_start))
tseconds=$seconds
minutes=$((seconds/60))
seconds=$((seconds-60*minutes))
hours=$((minutes/60))
minutes=$((minutes-60*hours))
echo =========================================================
echo SGE job: finished   date = `date`
echo Total run time : $hours Hours $minutes Minutes $seconds Seconds
echo Time in seconds: $tseconds Seconds
echo =========================================================

Note the reference to .mpich. The /panfs/dl/home/rja/.mpich/mpich_hosts.$JOB_ID (in this case $JOB_ID=748) file will contain a list of available nodes selected by SGE and available to run the job. This has the form:

comp000.nw-grid.ac.uk
comp001.nw-grid.ac.uk
comp002.nw-grid.ac.uk
comp003.nw-grid.ac.uk
comp004.nw-grid.ac.uk
comp005.nw-grid.ac.uk
comp006.nw-grid.ac.uk
comp007.nw-grid.ac.uk

This file is deleted when the job has run, but the list of nodes used will be kept in the job output file which is called myprog.sh.o$JOB_ID

> cat myprog.sh.o748
=========================================================
Cluster name         : dl1.nw-grid.ac.uk
Arch                 : x86_64
SGE job submitted    : Mon Jun  8 14:37:27 BST 2009
MPIRUN_ARGS          :
8 cpus on 2 nodes ( SMP=4 )
Executable file: /panfs/dl/home/rja/mydir/myprog
Executable args: arguments
MPI parallel job.
2 hosts used:
-------------
comp000 comp002 comp003
comp004 comp005 comp006
comp007
=========================================================
Job output begins
-----------------

<Hopefully your expected output will be here>

---------------
Job output ends
=========================================================
SGE job: finished date = Mon Jun 8 15:15:04 BST 2009
Total run time : 0 Hours 37 Minutes 28 Seconds
Time in seconds: 2248 Seconds
=========================================================

4) Running jobs with OpenMX

{X} Please ignore this section currently as OpenMX is still being tested.

It is possible to use OpenMX instead of TCP/IP on the Gb/s network which should give improved OpenMPI performance in some cases. Eth1 is currently the only network to be configured for OpenMX, so do:

> /opt/streamline/MCA-PARAMS/setup omx.eth1
> module load open-mx

Running the jobs as above should require no further changes.

5) Running jobs with MPICH-2

Some users have reported problems with OpenMPI. To enable them to run their applications we have therefore installed MPICH-2 v1.2.1 from Argonne National Laboratory as an alternative. Using MPICH-2 is a similar procedure to that described above. Use the appropriate modules, for instance compiler/pgi/8.0-3 and mpich/2-1.2.1/pgi, and run the job using mpich2sub.

For security reasons, MPICH-2 required a file named .mpd.conf in your home directory containing the line

secretword=<secretword>

where <secretword> is a string known only to yourself. It should not be your normal Unix password. Make this file readable and writable only by you:

cd $HOME
touch .mpd.conf
chmod 600 .mpd.conf

Then use an editor to place a line like: secretword=mr45-j9z into the file. (Of course use a different secret word than mr45-j9z.) Note if you do not do this you will get error messages relating to MPD, the multi-purpose daemon used to launch MPICH jobs.

To run a job you will need a script similar to the following.

> cat submit
#! /bin/bash
EXEC="myprog arguments"
rm myprog.sh
export QSUB_OPTIONS="-V"
mpich2sub 2x4 $EXEC

Note the use of mpich2sub rather than ompisub,

DL1Cluster (last edited 2010-04-20 09:24:32 by RobAllan)

This website maintained by Research Computing Services, University of Manchester