Top 500 trends
Mark Hahn
hahn at physics.mcmaster.ca
Wed Nov 27 11:26:24 PST 2002
> STREAM bandwidth is a performance characteristic: it's the bandwidth that a
> single processor achieves with the STREAM benchmark. It's not an application.
stream is a piece of source code. how the compiler/runtime actually
implements daxpy is completely free, and certainly does not require a single
address space. therefore, it's quite reasonable to talk about the Stream
score for a loosely coupled cluster. stream is almost the worst possible
kind of code to run on a cluster, though, simply because it has such a low
work:bandwidth ratio.
IMO, a benchmark appropriate for SMP would necessarily measure inter-CPU
latency, somehow, and stream does not. I always ignore multiprocessor stream
results, or else look strictly at the scaling of their per-cpu scores as the
machine gets bigger.
> To illustrate: on an SX-6, this is in the range of 25 GB/s/CPU on a 8-CPU
> node. A Pentium-4/Xeon Dual-SMP node get's about 0,5 GB/s/CPU (E7500 chipset
> - which has dual channel RAM, IIRC). This alone gives a performance advantage
> of about a factor 20-40 if not inside the caches, which shows in the MFLOPS
> efficiency (achieved vs. peak) of many codes (the ones which can be
> vectorized).
a "cutting edge chicken" would be a uniprocessor P4/fsb533/dual-PC2700,
delivering (as a guess) a little under 3 GBps/CPU.
> The SX-5 had even higher memory bandwidth, but in turn, the SX-6 is has become
> more cost- and energy-efficient.
the 3 Gflop chicken would dissipate around 200W; I am guessing the SX-6
dissipates more than 25/3*200=1.7 KW, no?
More information about the Beowulf
mailing list