<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.2900.2722" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><FONT face=Arial size=2>Dear all,</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>I am using mpich2 on linux cluster, I kept having
errors like the following</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>rank 14 in job 2 cn128_57798
caused collective abort of all ranks<BR> exit status of rank 14: killed by
signal 9<BR></FONT></DIV>
<DIV><FONT face=Arial size=2>or</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>mpdrun_cn145: cannot connect to local mpd
(/tmp/mpd2.console_lrz); possible causes:<BR> 1. no mpd is running on this
host<BR> 2. an mpd is running but was started without a "console" (-n
option)<BR></FONT></DIV>
<DIV><FONT face=Arial size=2>there are 160 nodes on the cluster, I used "mpdboot
-n -f" to initiate the mpi, and since there are always errors when i tried to
boot every nodes, so i only defined 64 nodes in mpd.hosts file, and in the
errors above, I dont have them in the mpd.hosts file or the command where i used
my application (mpiexec command)</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>does anybody have any experience in this? Thanks a
lot!</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>Best regards,</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>ruzhen</DIV></FONT></BODY></HTML>