Top 500 trends

Mark Hahn hahn at physics.mcmaster.ca
Wed Nov 27 11:26:24 PST 2002


> STREAM bandwidth is a performance characteristic: it's the bandwidth that a 
> single processor achieves with the STREAM benchmark. It's not an application.

stream is a piece of source code.  how the compiler/runtime actually 
implements daxpy is completely free, and certainly does not require a single
address space.  therefore, it's quite reasonable to talk about the Stream
score for a loosely coupled cluster.  stream is almost the worst possible 
kind of code to run on a cluster, though, simply because it has such a low 
work:bandwidth ratio.

IMO, a benchmark appropriate for SMP would necessarily measure inter-CPU 
latency, somehow, and stream does not.  I always ignore multiprocessor stream
results, or else look strictly at the scaling of their per-cpu scores as the
machine gets bigger. 

> To illustrate: on an SX-6, this is in the range of 25 GB/s/CPU on a 8-CPU 
> node. A Pentium-4/Xeon Dual-SMP node get's about 0,5 GB/s/CPU (E7500 chipset 
> - which has dual channel RAM, IIRC). This alone gives a performance advantage 
> of about a factor 20-40 if not inside the caches, which shows in the MFLOPS 
> efficiency (achieved vs. peak) of many codes (the ones which can be 
> vectorized). 

a "cutting edge chicken" would be a uniprocessor P4/fsb533/dual-PC2700,
delivering (as a guess) a little under 3 GBps/CPU.

> The SX-5 had even higher memory bandwidth, but in turn, the SX-6 is has become 
> more cost- and energy-efficient.

the 3 Gflop chicken would dissipate around 200W; I am guessing the SX-6
dissipates more than 25/3*200=1.7 KW, no?




More information about the Beowulf mailing list