> basically, 13 GB/s for a 2x2 opteron/2.8 system (peak flops would be 
> 2*2*2*2.8=22.4, so you need 1.7 flops per byte so you need 1.7 flops 
> per byte to be happy.
    Mmmm ... to my eye the Triad need 3 x 8 bytes = 24 bytes per 2 FLOP or
    12 bytes per 1 FLOP ... FLOPs per byte seems upside down as bandwidth
    Gbytes/sec are typically thought of as the scarce resource.

    The way I would do the calculation from your referenced data:

    Triad for 4 Opteron cores at 2.8 GHz  === 12889 GBytes/sec

    In 64-bit words moved that's ===  12889/8 bytes/word or 1611 GWords/sec

    The Triad does 2 FLOPs per 3 words moved (assumes 1 scalar value) ...
    so you can convert this to GFLOPS as:

    1611 GWords * .6667 =  1075  GFLOPs from the 4 cores or 268.5 MFLOPS per
    Opteron core.

    On a per core basis that is 4.7 % (ugh!) of peak on the Stream 
Triad.  So codes with
    kernels that have a low FLOP/MOP ratio (smallish with limited 
potential for
    cache re-use) watch out.  Xeon/Woodcrest would do worse on the 
Triad, better on HPL
    clock-for-clock.  This is, as Mark says, due to its "wider" CPU and 
larger cache
    (and a few other tricks that Woodcrest does with hoisting loads and 
prefetch engines).

    Hope this is useful ...





