DL1 Sun Cluster at Daresbury
The Daresbury NW-GRID Sun Cluster dl1.nw-grid.ac.uk comprises three compute racks each containing 32 SUN x4100 nodes. Each node contains 2 dual core 2.4Ghz AMD Opteron processors with 8GB of memory.
This Sun cluster was purchased from Streamline Computing in 2005 and software updated by them in 2009.
To use the Daresbury DL1 cluster, first log onto the head node dl1.nw-grid.ac.uk. Most users on the Daresbury Science and Innovation Campus network will be able to use ssh, other users should use gsi-ssh and will need an e-Science Certificate, see http://www.grid-support.ac.uk . For STFC staff see also http://www.nw-grid.ac.uk/SSO .
Brief NW-GRID Tutorial - Compiling and Running simple Jobs
This explains how to run simple jobs with OpenMPI or MPICH-2.
1) Compilation
In this first section we will assume you are using OpenMPI which is the preferred option. MPICH-2 is now also available on the system, see below. On NW-GRID we are using modules. You can see the modules provided on DL1 by typing:
> module avail ----------------------------------------- /usr/share/Modules/packages ------------------------------------------ globus pelegant sabre visit ---------------------------------------- /usr/share/Modules/modulefiles ---------------------------------------- compiler/gnu/4.2.1 dot mpich/2-1.2.1/gnu openmpi/1.3.1-1/intel compiler/gnu/4.4.1 module-cvs mpich/2-1.2.1/intel openmpi/1.3.1-1/pgi compiler/intel/11.0 module-info mpich/2-1.2.1/pgi openmpi/1.3.3-1/gnu compiler/pgi/8.0-3 modules openmpi/1.3.1-1/gnu use.own
In the following notes we will assume you want to use openmpi 1.3.1-1 with the Intel Fortran 90 compiler. To ensure the environment is set up for this, type:
>module list No Modulefiles Currently Loaded. > module load openmpi/1.3.1-1/intel > module load compiler/intel/11.0 > module list Currently Loaded Modulefiles: 1) compiler/intel/11.0 2) openmpi/1.3.1-1/intel
If you had any other module loaded it is preferable to unload that first.
The chosen compiler in the parallel environment is invoked using the mpif90 wrapper. This wrapper is also used for linking the appropriate libraries which are included by default. A simple parallel Fortran 90 application can thus be compiled as follows:
> mpif90 -c myprog.f90 > mpif90 -o myprog myprog.o
2) Setting up the internal network configuration
We will first assume that we will use OpenMPI over TCP/IP on the Gb/s network. Two networks are available for MPI communication and are linked to ethernet adapters eth1 and eth2. Streamline have provided a setup script to do the configuration as follows:
For eth1:
/opt/streamline/MCA-PARAMS/setup eth1
For eth2:
/opt/streamline/MCA-PARAMS/setup eth2
3) Running the application using Sun Grid Engine
Streamline Computing have provided a useful command called ompisub which creates and submits the SGE job script. It is suggested that this is used from a bash script (e.g. here called submit) as follows:
> cat submit #! /bin/bash EXEC="myprog arguments" rm myprog.sh export QSUB_OPTIONS="-V" ompisub 2x4 $EXEC
This will attempt to run the myprog executable on two nodes with 4x processor cores per node (referred to as 4-way SMP).
The ompisub command is itself a shell script which can be found in /usr/bin/ompisub. It can be copied and edited to make changes. The myprog.sh script which it creates can also be kept and edited for subsequent runs, e.g. by adding different names for the stdout and stderr files. Please note that is you are using MPICH-2 then you should use mpich2sub instead of ompisub, see below.
If this is submitted from a directory called /panfs/dl/home/rja/mydir, it will produce:
> ./submit
Generating SGE job file for a 8 cpu mpich job with SMP=4 from
executable /panfs/dl/home/rja/mydir/myprog.
QSUB mpirun -np 8 /panfs/dl/home/rja/mydir/myprog arguments
Done.
Submitting SGE job as follows:
qsub -pe openmpi 2 /panfs/dl/home/rja/mydir/myprog.sh
Sending standard output to file: /panfs/dl/home/rja/mydir/myprog.sh.o748
Sending standard error to file: /panfs/dl/home/rja/mydir/myprog.e748
Use the qstat command to query the job queue. e.g
qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
748 0.00000 myprog rja qw 06/08/2009 15:43:03 2
Job submission complete.The SGE script actually submitted to qsub is called myprog.sh and will be as follows:
> cat myprog.sh #!/bin/sh cd /panfs/dl/home/rja/mydir echo ========================================================= echo "Cluster name : dl1.nw-grid.ac.uk " echo "Arch : x86_64 " echo "SGE job submitted : Mon Jun 8 14:39:23 BST 2009" echo "MPIRUN_ARGS : " date_start=`date +%s` echo 8 cpus on 2 nodes \( SMP=4 \) echo Executable file: /panfs/dl/home/rja/mydir/myprog echo Executable args: arguments echo MPI parallel job. echo 2 hosts used: echo ------------- cat /panfs/dl/home/rja/.mpich/mpich_hosts.$JOB_ID | cut -f 1 -d \. | sort | fmt -w 30 echo ========================================================= echo Job output begins echo ----------------- echo #$ -cwd -V mpirun -np 8 /panfs/dl/home/rja/mydir/myprog arguments echo echo --------------- echo Job output ends date_end=`date +%s` seconds=$((date_end-date_start)) tseconds=$seconds minutes=$((seconds/60)) seconds=$((seconds-60*minutes)) hours=$((minutes/60)) minutes=$((minutes-60*hours)) echo ========================================================= echo SGE job: finished date = `date` echo Total run time : $hours Hours $minutes Minutes $seconds Seconds echo Time in seconds: $tseconds Seconds echo =========================================================
Note the reference to .mpich. The /panfs/dl/home/rja/.mpich/mpich_hosts.$JOB_ID (in this case $JOB_ID=748) file will contain a list of available nodes selected by SGE and available to run the job. This has the form:
comp000.nw-grid.ac.uk comp001.nw-grid.ac.uk comp002.nw-grid.ac.uk comp003.nw-grid.ac.uk comp004.nw-grid.ac.uk comp005.nw-grid.ac.uk comp006.nw-grid.ac.uk comp007.nw-grid.ac.uk
This file is deleted when the job has run, but the list of nodes used will be kept in the job output file which is called myprog.sh.o$JOB_ID
> cat myprog.sh.o748 ========================================================= Cluster name : dl1.nw-grid.ac.uk Arch : x86_64 SGE job submitted : Mon Jun 8 14:37:27 BST 2009 MPIRUN_ARGS : 8 cpus on 2 nodes ( SMP=4 ) Executable file: /panfs/dl/home/rja/mydir/myprog Executable args: arguments MPI parallel job. 2 hosts used: ------------- comp000 comp002 comp003 comp004 comp005 comp006 comp007 ========================================================= Job output begins ----------------- <Hopefully your expected output will be here> --------------- Job output ends ========================================================= SGE job: finished date = Mon Jun 8 15:15:04 BST 2009 Total run time : 0 Hours 37 Minutes 28 Seconds Time in seconds: 2248 Seconds =========================================================
4) Running jobs with OpenMX
Please ignore this section currently as OpenMX is still being tested.
It is possible to use OpenMX instead of TCP/IP on the Gb/s network which should give improved OpenMPI performance in some cases. Eth1 is currently the only network to be configured for OpenMX, so do:
> /opt/streamline/MCA-PARAMS/setup omx.eth1 > module load open-mx
Running the jobs as above should require no further changes.
5) Running jobs with MPICH-2
Some users have reported problems with OpenMPI. To enable them to run their applications we have therefore installed MPICH-2 v1.2.1 from Argonne National Laboratory as an alternative. Using MPICH-2 is a similar procedure to that described above. Use the appropriate modules, for instance compiler/pgi/8.0-3 and mpich/2-1.2.1/pgi, and run the job using mpich2sub.
For security reasons, MPICH-2 required a file named .mpd.conf in your home directory containing the line
secretword=<secretword>
where <secretword> is a string known only to yourself. It should not be your normal Unix password. Make this file readable and writable only by you:
cd $HOME touch .mpd.conf chmod 600 .mpd.conf
Then use an editor to place a line like: secretword=mr45-j9z into the file. (Of course use a different secret word than mr45-j9z.) Note if you do not do this you will get error messages relating to MPD, the multi-purpose daemon used to launch MPICH jobs.
To run a job you will need a script similar to the following.
> cat submit #! /bin/bash EXEC="myprog arguments" rm myprog.sh export QSUB_OPTIONS="-V" mpich2sub 2x4 $EXEC
Note the use of mpich2sub rather than ompisub,