Using hyperthreading on 2 Proc Xeon cluster nodes

Patrick Geoffray patrick at
Sat Jun 8 17:11:24 PDT 2002

Hi Bill,

Bill Broadley wrote:
>>BTW, the 2 virtual processors share the same FPU, so not interesting for 

> In the case of the P4 I'd agree, in general even with a shared FPU
> I could see hyperthreading being very useful.  Keep in mind even a single
> flop per cycle is often a big improvement over real world performance.
> If thread A blocks, and thread B can get some work done without having
> the expense of a context switch, getting work done during a cache miss 
> without the expense of a context switch can be a big win.

I am not sure that the 2 threads can swap the usage of the FPU that fast 
(save the FP pointer for example), but I didn't look carefully the 

In a general context, I agree. However, for many HPC applications, the 
computation core is highly optimized, like the BLAS used by HPL for 
example. ATLAS is tuned at the cycle level and it is quite close to the 

Hyperthreading reminds me about co-scheduling, when it makes sense only 
when there is several codes running at the same time: the total time of 
execution of all of the codes is reduced but not the execution time of 
each code individually.
It was a great idea, but it never really came into the production world, 
mainly because HPC metrics are looking for single application 
performance and because users are selfish :-))


|   Patrick Geoffray, Ph.D.      patrick at
|   Myricom, Inc.      
|   Cell:  865-389-8852          685 Emory Valley Rd (B)
|   Phone: 865-425-0978          Oak Ridge, TN 37830

More information about the Beowulf mailing list