MPI dies

Victor Karyo vkaryo at
Fri Sep 15 16:17:07 PDT 2000

Is there a technique to handle node failure?  Shortly, I'll be working on an 
that algorithm is naturally parallel and divided into course-grain "blocks". 
  I want to use a master/worker scheme.  The master is to be set to reissue 
blocks if the block doesn't return from the worker fast enough on the 
assumption the node has failed.  I know I can't rejoin a node after it 
fails, but if the node fails will the whole app die?

Also, is there a way to detect the number of nodes other than at 
initialization, so I can tell if a node has died?

(I plan on using MPI-Pro on a RH6.2 8-way single-proc Intel cluster with 
100mbps switched ethernet.)

Victor Karyo.

There are some efforts to build fault tolerating MPI's, but standard
MPI-1.x is supposed to kill the parallel application if a node dies,
or else the underlying system must transparently solve the fault.

Anthony Skjellum, PhD, President (tony at
MPI Software Technology, Inc., Ste. 33, 101 S. Lafayette, Starkville, MS 
+1-(662)320-4300 x15; FAX: +1-(662)320-4301;
"Best-of-breed Software for Beowulf and Easy-to-Own Commercial Clusters."

On Thu, 14 Sep 2000, Horatio B. Bogbindero wrote:

 > what happens if a node in MPI dies? is the entire computation lost?
 > ---------------------
 > william.s.yu at
 > I bought some used paint. It was in the shape of a house.
 > 		-- Steven Wright
 > _______________________________________________
 > Beowulf mailing list
 > Beowulf at

Beowulf mailing list
Beowulf at

Get Your Private, Free E-mail from MSN Hotmail at

Share information about yourself, create your own public profile at

More information about the Beowulf mailing list