[Beowulf] hang-up of HPC Challenge

Peter St. John peter.st.john at gmail.com
Tue Aug 19 15:12:36 PDT 2008


I surely don't know the problem, but can anyone tell me (or point me to...)
how "unlimited" stacksize works?
Peter

On 8/18/08, Mikhail Kuzminsky <kus at free.net> wrote:
>
> I ran a set of HPC Challenge benchmarks on ONE dual socket quad-core
> Opteron2350 (Rev. B3) based server (8 logical CPUs).
> RAM size is 16 Gbytes. The tests performed were under SuSE 10.3/x86-64, for
> LAM MPI 7.1.4 and MPICH 1.2.7 from SuSE distribution, using Atlas 3.9.
> Unfortunately there is only one such cluster node, and I can't reproduce the
> run on another node :-(
>
> For N (matrix size) up to 10000 all looks OK. But for more large N
> (15000/20000/...) hpcc execution (mpirun -np 8 hpcc) leads to Linux hang-up.
>
> In the "top" output I see 8 hpcc examplars each eating about 100% of CPU,
> and reasonable amounts of virtual and RSS memory per hpcc process, and the
> absense of swap using. Usually there is no PTRANS results in hpccoutf.txt
> results file, but in a few cases (when I "activelly looked" to hpcc
> execution by means of ps/top issuing) I see reasonable PTRANS results but
> absense of HPLinpack results. One time I obtained PTRANS, HPL and DGEMM
> results for N=20000, but hangup later - on STREAM tests. May be it's simple
> because of absense (at hangup) of final writing of output buffer to output
> file on HDD.
>
> One of possible reasons of hang-ups is memory hardware problem, but what is
> about possible software reasons of hangups ?
> The hpcc executable is 64-bit dynamically linked. /etc/security/limits.conf
> is empty. stacksize limit (for user issuing mpirun) is "unlimited", main
> memory limit - about 14 GB, virtual memory limit - about 30 GB. Atlas was
> compiled for 32-bit integers, but it's enough for such N values. Even
> /proc/sys/kernel/shmmax is 2^63-1.
>
> What else may be the reason of hangup ?
>
> Mikhail Kuzminskiy
> Computer Assistance to Chemical Research Center
> Zelinsky Institute of Organic Chemistry
> Moscow
>
>
>  _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080819/dc39a7e2/attachment.html>


More information about the Beowulf mailing list