[Beowulf] debugging

Mattijs Janssens m.janssens at opencfd.co.uk
Wed Apr 11 02:42:09 PDT 2007


Haven't used the debug facility in mpich so cannot 
comment on that one. But can you just start the 
job as usual and then on the node that crashes 
use 

gdb -pid XXX

?

I tend to use lam and there you can use an  
'application scheme' file which basically tells 
where to start what executable. You can use it to 
start a gdb session in a separate window for 
every process. Not recommendable for 32 processor 
jobs but excellent for development.

A sample application scheme to start 
'executableName' on two processors:
 
xterm -e /bin/sh -c "gdb -command gdbCommands 
executableName 2>&1 | tee processor0.log; read 
dummy"
xterm -e /bin/sh -c "gdb -command gdbCommands 
executableName 2>&1 | tee processor1.log; read 
dummy"

The gbCommands file contains the gdb commands:

run arg1 arg2
where

It must be possible to do something similar with 
mpich.

Mattijs

On Monday 09 April 2007 18:30, Matt Funk wrote:
> Hi,
>
> i hope this is the right mailing list to post
> to...
>
> Anyway, i was wondering if i could get some
> advice/direction on how to debug my mpich
> program. I am running on a scyld configuration.
> What i am trying right now is the following:
>
> mpirun -dbg=gdb -nolocal -np 32 exec
>
> which starts the debugger in which i go
> run args
>
> which then start the program. However, it
> doesn't get very far until it just sits there.
> When i ps all the processes are defunced.
>
> When i do the same thing except mpirun -dbg=gdb
> -nolocal -np 1 exec and run it in the debugger,
> the program starts running well.
>
> The reason i want to run on 32 processor
> though, is that it takes (on 32 procs) several
> hours till my program crashes. Also, i would
> like to be able to keep the conditions under
> which it crashes intact as much as possible
> (i.e. run on 32 procs rather than 1).
>
> Does anyone have any advice? I am open to try
> out other things as well if possible. I am just
> starting to learn debugger techniques for a
> parallel program.
>
> thanks
> mat
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf

-- 

Mattijs Janssens

OpenCFD Ltd.
9 Albert Road,
Caversham,
Reading RG4 7AN.
Tel: +44 (0)118 9471030
Email: M.Janssens at OpenCFD.co.uk
URL: http://www.OpenCFD.co.uk



More information about the Beowulf mailing list