[Beowulf] Explanation of error message in MPICH-1.2.7
Jeffrey B. Layton
laytonjb at charter.net
Fri Oct 6 12:50:49 PDT 2006
Afternoon cluster fans,
I'm working with a CFD code using the PGI 6.1 compilers and
MPICH-1.2.7. The code runs fine for a while but I get an error
message that I've never seen before:
[2] MPI Internal Aborting program Deep nest in Check_incoming
[2] Deep nest in Check_incoming
This error message is in the error file from PBS. The output from
the code gives the following:
p2_15458: p4_error: : 1
p5_21530: p4_error: net_recv read: probable EOF on socket: 1
p7_21548: p4_error: net_recv read: probable EOF on socket: 1
p6_21539: p4_error: net_recv read: probable EOF on socket: 1
rm_l_6_21544: (95.492188) net_send: could not write to fd=5, errno = 32
rm_l_2_15464: (95.835938) net_send: could not write to fd=5, errno = 32
rm_l_5_21535: (95.574219) net_send: could not write to fd=5, errno = 32
rm_l_7_21553: (95.410156) net_send: could not write to fd=5, errno = 32
The code runs fine with other MPI implementations (Scali,
MVAPICH, etc.) My googling efforts haven't yielded anything.
Does anyone have any input on this?
Thanks!
Jeff
More information about the Beowulf
mailing list