[Beowulf] Opinions of Hyper-threading?

Vincent Diepeveen diep at xs4all.nl
Wed Feb 27 11:30:29 PST 2008

On Feb 25, 2008, at 10:03 PM, Mark Hahn wrote:

>> I believe it's actually simultaneous, instructions from 2  
>> different processes can run in the same cycle against 2 different  
>> register files.
> for some definition of 'simultaneous'.  I suspect that netburst-HT  
> simply
> runs with a thread until it stalls, then switches.  I don't think  
> Intel ever detailed which stalling events do this.  in some of the  
> initial papers
> on netburst-HT, it was implied that the implementation was almost a
> side-effect of how the chip tracks in-flight operations.  since no  
> modern chip really has a unitary pipeline, HT might well tolerate  
> one thread
> chugging through a microcoded transcendental at the same time as  
> another,
> say, follows a pointer.
>>> had assumed so but I appear to be confused about it.  
>>> Hyperthreading keeps a
>>> thread ready to take advantage of stalls in a preceeding thread,  
>>> but doesn't
>>> ever actually perform a second instruction in one click tick,  
>>> correct? One
> I believe HT switching does happen cycle-by-cycle, and would guess  
> that in-flight ops from multiple threads can coexist (not executing  
> on the same unit in the same cycle, though.)
> to me, this makes a lot more sense than manycore chips, actually.
> manycore basically assumes that tracking inflight ops is the main  
> scaling problem with modern chips.  that may be the case, but I've  
> never really heard it described as such.
> imagine if, instead of 8 cores onchip, you just had 8 "thread  
> sequence"
> units that contained fetch/decode, architected registers and  
> retirement.
> and a single big pool of scoreboarded functional units, of course.   
> the advantage being that one thread could use many units.  as  
> opposed to a static 8-core where each thread gets only the unit(s)  
> in its core...

Hi Mark,

Let's calculate with your imaginary chip where you get rid of the  
multicore thought and have to get rid of out of order in order to get  
your thread sequence idea to work:

If you've got 8 threads that execute each 1 instruction a cycle,

8 * 1 * 3Ghz = 24 Gflop double precision

Now let's compare with a todays quadcore, a system we build for just  
600 euro, like the nodes i'm planning to build now:

198 euro for the chip @ 2.4Ghz of intel and 134 euro for amd @ 2.2Ghz:

here goes calculation against your 24 gflop:

4 cores * 3 instructions a cycle * 2 DP in each SSE2/SSE3 vector *  
2.4Ghz = 24 * 2.4 = 48 + 9.6 = 57.6

It is very hard to beat todays quadcores with the imaginary cpu of  
the future.

Multicore and out of order are big winners that butcher RISC and the  
old Alpha engineers SMT idea completely,
with exception of power usage.

Multicore right now means BOOM you are factor 4.0 faster nearly (3.8  
in case of my chessproggie), and out of order means you have
a potential of 3 to 4 instructions a cycle which is a big winner too.

Replacing that with some other technique SMT means the other  
technique SMT needs to find a factor 12 in speed somewhere.


> I think the main takehome from netburst-HT is that SMT needs to  
> provide more units, not just provide a new way for two threads to  
> interfere.
> regards, mark hahn.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list