[Beowulf] Re: Opteron 275 performance

Mon Aug 1 09:23:55 PDT 2005

Mark Hahn wrote:

>>Be very careful.  Hyper TRANSPORT is what AMD (and the HT consortium)
>>push as a replacment to the traditional bus -- it amounts to putting the
>>CPUs, memory, and all peripherals on a very high bandwidth low latency
>>network, IIRC.  It enables a sort of SMP design very similar to a
>>"cluster" inside a single box, whether the processors are single or
>>dual core, and is the way AMD is going quite heavily in their future
>>designs.  It is very useful and relevant to SMP, multicore, and single
>>core designs and important to HPC.
>>    
>>
>
>I find this slightly confusing.  HTrans is indeed a fast interconnect,
>but the salient features are that it's purely point-to-point and that 
>AMD's cache-coherency is implemented using it.
>
     It might be worth adding that the standard allows both the speed 
and the width of the
     channel to be varied to give maximum flexibility on how and how 
much bandwidth is
     delivered.  This in turn allows HT to be used in different contexts 
...

>>Hyper THREADING is Intel's solution to what amounts to an overlong
>>instruction pipeline on the CPU itself.  In single-threaded code a long
>>    
>>
>
>hmm, not really.  Intel's pipeline length is indeed very long,
>but hyperthreading is all about *switching*, not pipelining.
>
>the real problem with hyperthreading is that it works best with bad code:
>code that normally spends most of its time stalled (cache/tlb misses, etc).
>on code that makes effective use of the hardware, it's only a slowdown.
>  
>
    Or, best when threads competing for the CPUs time are not identical 
twins ... more often the
    case across processes ... where throughput can be improved.  HPC 
threads tend in the direction
    of twins.

>>It really isn't at all clear that HThreading needs to live (as my kids
>>would put it:-).  AMD just uses shorter instruction pipelines (less to
>>flush and fill) and seems to outperform Hthreaded Intel at constant
>>clock, at any rate.  At MOST it yields a 30% or so speedup that is
>>relevant to somebody doing work with lots of independent things going on
>>on a single CPU -- typically (unsurprisingly) a desktop user watching a
>>movie or the like -- decoding video and audio at the same time, or
>>servers handling multiple service threads.  In others, it yields a
>>DECREASE of say 10% due to aforementioned cache-thrashing.</editorial
>>comment>
>>    
>>
>
>HT is a very limited implementation of a general class of multi-threaded
>processors.  IBM and others have done better SMT.  and the appeal is clear:
>most programs do not manage to keep all a CPU's functional units busy,
>so why not share the pool of FU's among multiple threads?  I'm expecting 
>someone to eventually replicate the proc-specific parts (registers and 
>L1 cache), but share all the FU's in a package.  sort of merged multi-core.
>on the other hand, the chip area devoted to actual compute elements (ALU/FPU)
>is dwarfed by caches...
>  
>
     Hyperthreading capitalizes on the fact that a CPU (more-or-less) 
does not care from what thread
     a given intructions it is executing to push (in some contexts) the 
actual observed ILP in the direction
     of a processors theoretical maximum ILP [instruction level 
parallelism].

     regards,

     rbw

>regards, mark hahn.
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
>  
>