I surely don't know the problem, but can anyone tell me (or point me to...) how "unlimited" stacksize works?<br>
Peter<br><br><div><span class="gmail_quote">On 8/18/08, <b class="gmail_sendername">Mikhail Kuzminsky</b> <<a href="mailto:kus@free.net">kus@free.net</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
I ran a set of HPC Challenge benchmarks on ONE dual socket quad-core Opteron2350 (Rev. B3) based server (8 logical CPUs).<br>
RAM size is 16 Gbytes. The tests performed were under SuSE 10.3/x86-64,
for LAM MPI 7.1.4 and MPICH 1.2.7 from SuSE distribution, using Atlas
3.9. Unfortunately there is only one such cluster node, and I can't
reproduce the run on another node :-(<br>
<br>
For N (matrix size) up to 10000 all looks OK. But for more large N
(15000/20000/...) hpcc execution (mpirun -np 8 hpcc) leads to Linux
hang-up.<br>
<br>
In the "top" output I see 8 hpcc examplars each eating about 100% of
CPU, and reasonable amounts of virtual and RSS memory per hpcc process,
and the absense of swap using. Usually there is no PTRANS results in
hpccoutf.txt results file, but in a few cases (when I "activelly
looked" to hpcc execution by means of ps/top issuing) I see reasonable
PTRANS results but absense of HPLinpack results. One time I obtained
PTRANS, HPL and DGEMM results for N=20000, but hangup later - on STREAM
tests. May be it's simple because of absense (at hangup) of final
writing of output buffer to output file on HDD.<br>
<br>
One of possible reasons of hang-ups is memory hardware problem, but what is about possible software reasons of hangups ? <br>
The hpcc executable is 64-bit dynamically linked.
/etc/security/limits.conf is empty. stacksize limit (for user issuing
mpirun) is "unlimited", main memory limit - about 14 GB, virtual memory
limit - about 30 GB. Atlas was compiled for 32-bit integers, but it's
enough for such N values. Even /proc/sys/kernel/shmmax is 2^63-1.<br>
<br>
What else may be the reason of hangup ?<br>
<br>
Mikhail Kuzminskiy<br>
Computer Assistance to Chemical Research Center<br>
Zelinsky Institute of Organic Chemistry<br>
Moscow<br>
<br>
<br>
_______________________________________________<br>
Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">Beowulf@beowulf.org</a><br>
To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">http://www.beowulf.org/mailman/listinfo/beowulf</a><br>
</blockquote></div><br>