Core files under mpich, p4 device
Jaco Schieke
schieke at cae.wisc.edu
Mon Oct 22 03:03:10 PDT 2001
Sorry - I should have stated that, but I have checked for limits. If I ssh
to a compute node I get:
@csrv ~>ssh node44 limit
cputime unlimited
filesize unlimited
datasize unlimited
stacksize 8192 kbytes
coredumpsize unlimited
memoryuse unlimited
descriptors 1024
memorylocked unlimited
maxproc 16382
openfiles 1024
I also found that I can produce a core dump if the offending statement is
before MPI_Initialize(), but not when it is after.
I presume that you are able to get core files?
Jaco Schieke
Dept. of Chemical Engineering
University of Wisconsin-Madison
----- Original Message -----
From: "Rayson Ho" <raysonlogin at yahoo.com>
To: "Jaco Schieke" <schieke at cae.wisc.edu>; <beowulf at beowulf.org>
Sent: Monday, October 22, 2001 3:10 PM
Subject: Re: Core files under mpich, p4 device
> What does "limit" show??
>
> If you have "coredumpsize = 0", no core files will be generated.
>
> cputime unlimited
> filesize unlimited
> datasize unlimited
> stacksize 2044 kbytes
> coredumpsize 0 kbytes
> memoryuse unlimited
> descriptors 1024
> memorylocked unlimited
> maxproc 8192
> openfiles 1024
>
> Rayson
>
> --- Jaco Schieke <schieke at cae.wisc.edu> wrote:
> > All,
> >
> > Has anybody been able to produce core files under mpich using the p4
> > device. I have been able to verify on 2 different clusters
> > that SIGSEGV errors under mpich does not produce a core file. Below
> > are the error msgs, but not core files appeared. I have not
> > tried kernel patches to produce named core files - but would first
> > like to know whether this will solve things.
> >
> > How does one produce these?
> >
> > Jaco Schieke
> > Research Assistant
> > Dept. of Chemical Engineering
> > Univ. of Wisconsin Madison
> >
> >
> > Host 1: Linux 2.2.16-22 #1 Tue Aug 22 16:49:06 EDT 2000 i686 unknown
> > p0_26109: p4_error: interrupt SIGSEGV: 11
> > bm_list_26110: p4_error: interrupt SIGINT: 2
> > p0_26109: (4.578240) net_send: could not write to fd=4, errno = 32
> >
> >
> > Host 2: Linux 2.4.6 #1 Sun Aug 19 12:44:48 CDT 2001 i686 unknown
> > p4_32319: p4_error: interrupt SIGSEGV: 11
> > p2_3383: p4_error: interrupt SIGSEGV: 11
> > Broken pipe
> > rm_l_5_7126: p4_error: net_recv read: probable EOF on socket: 1
> > rm_l_1_3581: (1417.766098) net_recv failed for fd = 6
> > rm_l_1_3581: p4_error: net_recv read, errno = : 104
> > Cleaning up
> > rm_l_3_5475: (1417.574178) net_recv failed for fd = 6
> > rm_l_3_5475: p4_error: net_recv read, errno = : 104
> >
> >
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
> __________________________________________________
> Do You Yahoo!?
> Make a great connection at Yahoo! Personals.
> http://personals.yahoo.com
>
More information about the Beowulf
mailing list