Running padb with SGE jobs
Ashley Pittman's padb is a command-line MPI debugger, which is useful even if you are flush enough to have one of the proprietary parallel debuggers -- they don't lend themselves to convenient non-interactive operation. However, it's not straighforward to use padb to debug a specified SGE job from the head node.
padb-sge is a wrapper which tries to do the right thing, given the number of a tightly integrated parallel SGE job to target with padb. It has been tested with our SGE 6.2 installation, OpenMPI 1.4, and the current padb.
Usage: padb-sge <jobnum>[.<tasknum>] <padb args>... Run padb(1) with given args against the parallel job with the given job/task number. E.g. padb-sge 123 --proc-summary --all
By the way, you need a version of padb from later than 2010-12-05 if, like ours, your OpenMPI has the checkpoint/restart facility available, or anything else that might change the output of ompi-ps similarly.