Hyperthreading in P4 Xeon (question)

Wed Apr 3 11:16:28 PST 2002

> I can amplify that point.  A commercial CFD application ran significantly 
> slower using 4 threads vs 2 on a dual Prestonia system.  Anything memory 
> limited will probably behave the same way.

well, it's an interesting issue.  afaikt, the benefit of HT depends
on what degree your app leaves idle resources.  for instance,
if everything you run is thrashing your dram bandwidth (big arrays,
perhaps), then forget HT - it doesn't add extra dimms!
similarly, if the CPU has just one fsqrt unit, and that's your 
bottleneck, HT doesn't add more units.  there are other resource
nonlinearities, like cache hitrate - the same effect that gives rise
to superlinear SMP speedup will slaughter some apps run on HT...

but if there's other work to be done while one thread is spinning
sqrt's, ie, there are idle resources, then a thread that uses them
will show HT profit...

in some sense, HT works precisely when the system's resources
*don't* match the optimal set your app wants.  I wonder if/when
Intel will start pouring in hordes of extra functional units,
since another 50M transistors will only improve the cache hit rate
a little bit...  of course, it's also true that HT makes bigger
TLB's and more associative caches attractive...