Some basic information on using the Lancaster NW-GRID cluster

Description of the System

The Lancaster NW-Grid service consists of two clusters:

Both clusters support parallel jobs via MPI. Job submission is handled locally via the Sun Grid Engine, or through a Globus-accessible sge jobmanager.

Connecting to the systems

The machine names are lancs1.nw-grid.ac.uk and lancs2.nw-grid.ac.uk. Access to the clusters is only possible using standard grid tools such as the those provided by the Globus Toolkit.

Unix/Linux users

For Unix/Linux systems, the Globus middleware can be installed on a local machine via the Virtual Data Toolkit (only the VDT-Client package is required). Instructions on how to install and use VDT are available on their website. Using the Globus Toolkit's gsissh, the lancs1 cluster frontend can be accessed with the following command:

File transfer can be accomplished via the command gsiscp. To transfer a file onto the lancs1 cluster head node:

To tranfer a file from the same cluster head node to your local desktop:

Windows users

Windows users can access the GSI-SSHterm application available from the National Grid Service both to access the cluster head node and to transfer files.

Modules

Each head node and execution nodes support access to various software packages using the module command. To see the available modules, type module avail. To see a brief description of a module, type module whatis modulename. The command module add modulename will add the relevant module to your current environment, and allow access to the software.

Modules can also be added from within local batch jobs.

Compilers

The Lancaster nodes offer three compiler suites:

Local batch job submission from the head node

Alongside the standard Globus job submission mechanisms, batch jobs can be submitted from a head node using the standard Sun Grid Engine job submission mechanisms Batch jobs are run on the lcuster by creating a batch job control script (or command file) and "submitting" it to the system using the command qsub, e.g.:

Assuming that there is at least one job-slot free, the system will select an execution node on which to run your job. This ensures that the combined load of all users' jobs is spread evenly over the entire cluster. If no suitable slot is available at the time then the job will wait in a "pending" queue until one becomes free.

At present, the system uses a Fair Share scheduling strategy; users may submit any number of jobs, however priority will be given to those who are currently running fewer jobs. Please check the head nodes' message of the day for changes to scheduling.

Example of a batch job control script

#!/bin/bash

#$ -o $HOME/my_program_directory/my_program.stdout
#$ -e $HOME/my_program_directory/my_program.stderr
#$ -S /bin/bash

. /etc/profile

cd my_program_directory
time my_program < my_program.input

Explanation Batch job scripts are simply standard shell scripts with extra lines (beginning with "#$") containing instructions for the scheduler.

The first line:

#!/bin/bash

specifies the shell which is to interpret this script. Leave this line exactly as shown unless you need a different shell to interpret your job.

The next two lines:

#$ -o $HOME/my_program_directory/my_program.stdout
#$ -e $HOME/my_program_directory/my_program.stderr

are SGE directives used to specify the destination for your job's standard output and standard error respectively. (You don't have to specify these files. If you don't, default standard output and standard error files will be created in your HPC home directory, with names based upon the job id and job name.)

This line:

#$ -S /bin/bash

Instructs SGE to execute commands in this script according to the bash shell. This is the recommended shell for all batch jobs.

The next line:

. /etc/profile

Sets up your environment to ensure that programs can find the relevant libraries, and that the modules system is available. This line should always be included in your job scripts.

The next line:

cd my_program_directory

specifies the current working directory for your job. Note that when your batch job starts, its current working directory will be your home directory and not the current directory of the interactive session from which you submit the batch job.

The last line:

time my_program < my_program.input

is the command to run your program. This is normally the same as the command you would type if you were running it interactively. In this example the command to run the program (my_program < my_program.input) is prefixed by the system command time. This causes a timing summary to be printed to the standard error file when the job finishes. The time command is not neccesary for job scripts; it simply provides a useful summary of the length of time your program took to run.

Note that any standard input to the program (what you would type at the keyboard if you were running it interactively) must be put into a file, my_program.input in this case. The redirection operator, <, then makes the program read this file for its input.

Submitting large memory jobs

As each node can run multiple jobs, there is a risk that jobs with large memory requirements may oversubscribe memory on a node, leading to poor performance for all jobs. To prevent this, batch jobs which require more than 500M of memory must be submitted with a memory resource request, by adding the following lines to the qsub command:

qsub -l mem_free=xG -l mem_token=xG myjob.com

Where x is a real value indicating the amount of memory required in gigabytes.

Local parallel job submission from the head node

Compiling for MPI

MPI codes are handled via different parallel environments on the two nodes. SCore for lancs1 and OpenMPI for lancs2. First, load the relevant module in order to access the MPI compilers and other tools:

or

You can now compile your parallel application using the relevant compiler(s); mpicc, mpiCC, mpif77, mpif90. By default, these MPI compilers will invoke the standard GNU compilers; compiler flags in this mode should therefore be those you would normally use for the GNU Compiler Collection.

For improved performance, the SCore MPI compiler wrappers can be directed to use one of the other compilers suites. With SCore this can be achieved by adding the arguments -compiler pgi or -compiler intel to any of the compiler calls. For OpenMPI, separate modules are provided for Intel and PGI compilers.

Please note: You must ensure that the MPI compilers have access to the relevant compiler suite by ensuring that its module is loaded, e.g.:

Submitting MPI jobs

To launch a parallel MPI jobs DO NOT write your own job submission script. Instead, run the mpisub (lancs1) or ompisub (lancs2) command:

Where n is the number of execution nodes you want the job to run on, m is the number of CPUs on each node to use (this number should normally be four), and myexecutable is the name of the SCore-compiled application you wish to run.

E.g., to run the application myapp on 4 nodes, each using 4 CPUs (ie, a total of 16 CPUs), enter the following:

mpisub will automatically generate a script for your executable and submit it to the queue. The output files will appear in your current working directory.

Please Note: The MPI environments work on a node-booking system; a parallel job cannot share a node with other serial or parallel jobs. So, if you run mpisub with an m value of less than the total number of processors on each node, the remaining CPUs on each node will be effectively blocked from use by others. Use smaller m values only when necessary!

Please Note 2: On lancs1, the Inter Process Communication fabric for SCore jobs runs only between nodes on the same physical rack; the job scheduler will choose an appropriate rack for you.

File redirection with MPI jobs

If you need to redirect standard input or output for a job, what may seem to be the obvious approach will not work:

What this command actually does is to instruct mpisub to take its input from myinput. Instead, you need to protect the file redirection from the shell, so that it is passed as an argument to mpisub to be appended to its call to myprogram:

LancsHelp (last edited 2010-04-23 08:59:47 by MikePacey)

This website maintained by Research Computing Services, University of Manchester