[Beowulf] bizarre scaling behavior on a Nehalem

Mikhail Kuzminsky kus at free.net
Fri Aug 14 11:47:01 PDT 2009


In message from Bill Broadley <bill at cse.ucdavis.edu> (Thu, 13 Aug 2009 
17:09:24 -0700):

Do I unerstand correctly that this results are for 4 cores& 4 openmp 
threads ? 
And what is DDR3 RAM: DDR3/1066 ?

Mikhail



>I tried open64-4.2.2 with those flags and on a nehalem single socket:
>$ opencc -O4 -fopenmp stream.c -o stream-open64 -static
>$ opencc -O4 -fopenmp stream-malloc.c -o stream-open64-malloc -static
>$ ./stream-open64
>Total memory required = 457.8 MB.
>Function      Rate (MB/s)   Avg time     Min time     Max time
>Copy:       22061.4958       0.0145       0.0145       0.0146
>Scale:      22228.4705       0.0144       0.0144       0.0145
>Add:        20659.2638       0.0233       0.0232       0.0233
>Triad:      20511.0888       0.0235       0.0234       0.0235
>Dynamic:
>$ ./stream-open64-malloc
>
>Function      Rate (MB/s)   Avg time     Min time     Max time
>Copy:       14436.5155       0.0222       0.0222       0.0222
>Scale:      14667.4821       0.0218       0.0218       0.0219
>Add:        15739.7070       0.0305       0.0305       0.0305
>Triad:      15770.7775       0.0305       0.0304       0.0305
>
>> Intel C/C++ Compiler 10.1 on Harpertown CPUs:
>> Base OPT flags:	 -O2 -xT -ansi-alias -ip -i-static
>> Intel recently used
>> Intel C/C++ Compiler 11.0.081 on Nehalem CPUs:
>> 	 -O2 -xSSE4.2 -ansi-alias -ip
>> and got good STREAM results in their HPCC submission on their 
>>ENdeavor cluster.
>
>$ icc -O2 -xSSE4.2 -ansi-alias -ip -openmp stream.c -o stream-icc
>$ icc -O2 -xSSE4.2 -ansi-alias -ip -openmp stream-malloc.c -o
>stream-icc-malloc
>
>$ ./stream-icc | grep ":"
>STREAM version $Revision: 5.9 $
>Copy:       14767.0512       0.0022       0.0022       0.0022
>Scale:      14304.3513       0.0022       0.0022       0.0023
>Add:        15503.3568       0.0031       0.0031       0.0031
>Triad:      15613.9749       0.0031       0.0031       0.0031
>$ ./stream-icc-malloc | grep ":"
>STREAM version $Revision: 5.9 $
>Copy:       14604.7582       0.0022       0.0022       0.0022
>Scale:      14480.2814       0.0022       0.0022       0.0022
>Add:        15414.3321       0.0031       0.0031       0.0031
>Triad:      15738.4765       0.0031       0.0030       0.0031
>
>So ICC does manage zero penalty, alas no faster than open64 with the 
>penalty.
>
>I'll attempt to track down the HPCC stream source code to see if 
>their dynamic
>arrays are any friendlier than mine (I just use malloc).
>
>In any case many thanks for the pointer.
>
>Oh, my dynamic tweak:
>$ diff stream.c stream-malloc.c
>43a44
>> # include <stdlib.h>
>97c98
>< static double	a[N+OFFSET],
>---
>> /* static double	a[N+OFFSET],
>99c100,102
>< 		c[N+OFFSET];
>---
>> 		c[N+OFFSET]; */
>>
>> double *a, *b, *c;
>134a138,142
>>
>>     a=(double *)malloc(sizeof(double)*(N+OFFSET));
>>     b=(double *)malloc(sizeof(double)*(N+OFFSET));
>>     c=(double *)malloc(sizeof(double)*(N+OFFSET));
>>
>283c291,293
><
>---
>>     free(a);
>>     free(b);
>>     free(c);
>
>
>
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin 
>Computing
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf
>
>-- 
>üÔÏ ÓÏÏÂÝÅÎÉÅ ÂÙÌÏ ÐÒÏ×ÅÒÅÎÏ ÎÁ ÎÁÌÉÞÉÅ × ÎÅÍ ×ÉÒÕÓÏ×
>É ÉÎÏÇÏ ÏÐÁÓÎÏÇÏ ÓÏÄÅÒÖÉÍÏÇÏ ÐÏÓÒÅÄÓÔ×ÏÍ
>MailScanner, É ÍÙ ÎÁÄÅÅÍÓÑ
>ÞÔÏ ÏÎÏ ÎÅ ÓÏÄÅÒÖÉÔ ×ÒÅÄÏÎÏÓÎÏÇÏ ËÏÄÁ.
>




More information about the Beowulf mailing list