[Beowulf] Nehalem and Shanghai code performance for our rzf example
richard.walsh at comcast.net
richard.walsh at comcast.net
Tue Jan 20 15:24:42 PST 2009
----- Original Message -----
From: "Bill Broadley" bill at cse.ucdavis.edu
>If gallium arsenide or some other material gave us 10x the clock rate per
>watt, but 1/2 the transistors would it really matter? Seemed like even intel
>is begrudgingly admitting it's the memory bus, and finally the nehalem is
>blessed with dramatically more bandwidth.
>
>Seems like increasingly cores are turning latency limited workloads (for the
>parallel jobs of course) into bandwidth limited ones. Without a memory bus
>that allows for 10x the bandwidth it doesn't really seem like 10x the clock
>rate would be of particular use.
Right. Excepting the potential for improving the performance of serial codes
or pieces of serial code (and perhaps badly written code) , delivering 10x by
clock or by core would not seem to change the bandwidth problem both create .
Manycore core promises even greater multiples. For bandwidth limited data
parallel codes, you might as well stay on the path of lowest economic resistance.
rbw
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20090120/5392b4e7/attachment.html>
More information about the Beowulf
mailing list