Core files under mpich, p4 device

Jaco Schieke schieke at cae.wisc.edu
Mon Oct 22 03:03:10 PDT 2001


Sorry - I should have stated that, but I have checked for limits.  If I ssh
to a compute node I get:

@csrv ~>ssh node44 limit
cputime         unlimited
filesize        unlimited
datasize        unlimited
stacksize       8192 kbytes
coredumpsize    unlimited
memoryuse       unlimited
descriptors     1024
memorylocked    unlimited
maxproc         16382
openfiles       1024

I also found that I can produce a core dump if the offending statement is
before MPI_Initialize(), but not when it is after.

I presume that you are able to get core files?

Jaco Schieke
Dept. of Chemical Engineering
University of Wisconsin-Madison


----- Original Message -----
From: "Rayson Ho" <raysonlogin at yahoo.com>
To: "Jaco Schieke" <schieke at cae.wisc.edu>; <beowulf at beowulf.org>
Sent: Monday, October 22, 2001 3:10 PM
Subject: Re: Core files under mpich, p4 device


> What does "limit" show??
>
> If you have "coredumpsize = 0", no core files will be generated.
>
> cputime         unlimited
> filesize        unlimited
> datasize        unlimited
> stacksize       2044 kbytes
> coredumpsize    0 kbytes
> memoryuse       unlimited
> descriptors     1024
> memorylocked    unlimited
> maxproc         8192
> openfiles       1024
>
> Rayson
>
> --- Jaco Schieke <schieke at cae.wisc.edu> wrote:
> > All,
> >
> > Has anybody been able to produce core files under mpich using the p4
> > device.  I have been able to verify on 2 different clusters
> > that SIGSEGV errors under mpich does not produce a core file.  Below
> > are the error msgs, but not core files appeared.  I have not
> > tried kernel patches to produce named core files - but would first
> > like to know whether this will solve things.
> >
> > How does one produce these?
> >
> > Jaco Schieke
> > Research Assistant
> > Dept. of Chemical Engineering
> > Univ. of Wisconsin Madison
> >
> >
> > Host 1: Linux 2.2.16-22 #1 Tue Aug 22 16:49:06 EDT 2000 i686 unknown
> > p0_26109:  p4_error: interrupt SIGSEGV: 11
> > bm_list_26110:  p4_error: interrupt SIGINT: 2
> > p0_26109: (4.578240) net_send: could not write to fd=4, errno = 32
> >
> >
> > Host 2: Linux 2.4.6 #1 Sun Aug 19 12:44:48 CDT 2001 i686 unknown
> > p4_32319:  p4_error: interrupt SIGSEGV: 11
> > p2_3383:  p4_error: interrupt SIGSEGV: 11
> > Broken pipe
> > rm_l_5_7126:  p4_error: net_recv read:  probable EOF on socket: 1
> > rm_l_1_3581: (1417.766098) net_recv failed for fd = 6
> > rm_l_1_3581:  p4_error: net_recv read, errno = : 104
> > Cleaning up
> > rm_l_3_5475: (1417.574178) net_recv failed for fd = 6
> > rm_l_3_5475:  p4_error: net_recv read, errno = : 104
> >
> >
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
> __________________________________________________
> Do You Yahoo!?
> Make a great connection at Yahoo! Personals.
> http://personals.yahoo.com
>





More information about the Beowulf mailing list