[Beowulf] Explanation of error message in MPICH-1.2.7
Bill Rankin
wrankin at ee.duke.edu
Mon Oct 16 06:07:01 PDT 2006
I often see this error when a MPI_Barrier() call is not placed in
front of the MPI_Finalize(). One of the processes exits early and
MPICH doesn't like that too much.
-b
On Oct 6, 2006, at 3:50 PM, Jeffrey B. Layton wrote:
> Afternoon cluster fans,
>
> I'm working with a CFD code using the PGI 6.1 compilers and
> MPICH-1.2.7. The code runs fine for a while but I get an error
> message that I've never seen before:
>
>
> [2] MPI Internal Aborting program Deep nest in Check_incoming
> [2] Deep nest in Check_incoming
>
> This error message is in the error file from PBS. The output from
> the code gives the following:
>
>
> p2_15458: p4_error: : 1
> p5_21530: p4_error: net_recv read: probable EOF on socket: 1
> p7_21548: p4_error: net_recv read: probable EOF on socket: 1
> p6_21539: p4_error: net_recv read: probable EOF on socket: 1
> rm_l_6_21544: (95.492188) net_send: could not write to fd=5, errno
> = 32
> rm_l_2_15464: (95.835938) net_send: could not write to fd=5, errno
> = 32
> rm_l_5_21535: (95.574219) net_send: could not write to fd=5, errno
> = 32
> rm_l_7_21553: (95.410156) net_send: could not write to fd=5, errno
> = 32
>
>
> The code runs fine with other MPI implementations (Scali,
> MVAPICH, etc.) My googling efforts haven't yielded anything.
> Does anyone have any input on this?
>
> Thanks!
>
> Jeff
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf
mailing list