[Beowulf] Re: Opteron 275 performance
Richard Walsh
rbw at ahpcrc.org
Mon Aug 1 09:23:55 PDT 2005
Mark Hahn wrote:
>>Be very careful. Hyper TRANSPORT is what AMD (and the HT consortium)
>>push as a replacment to the traditional bus -- it amounts to putting the
>>CPUs, memory, and all peripherals on a very high bandwidth low latency
>>network, IIRC. It enables a sort of SMP design very similar to a
>>"cluster" inside a single box, whether the processors are single or
>>dual core, and is the way AMD is going quite heavily in their future
>>designs. It is very useful and relevant to SMP, multicore, and single
>>core designs and important to HPC.
>>
>>
>
>I find this slightly confusing. HTrans is indeed a fast interconnect,
>but the salient features are that it's purely point-to-point and that
>AMD's cache-coherency is implemented using it.
>
It might be worth adding that the standard allows both the speed
and the width of the
channel to be varied to give maximum flexibility on how and how
much bandwidth is
delivered. This in turn allows HT to be used in different contexts
...
>>Hyper THREADING is Intel's solution to what amounts to an overlong
>>instruction pipeline on the CPU itself. In single-threaded code a long
>>
>>
>
>hmm, not really. Intel's pipeline length is indeed very long,
>but hyperthreading is all about *switching*, not pipelining.
>
>the real problem with hyperthreading is that it works best with bad code:
>code that normally spends most of its time stalled (cache/tlb misses, etc).
>on code that makes effective use of the hardware, it's only a slowdown.
>
>
Or, best when threads competing for the CPUs time are not identical
twins ... more often the
case across processes ... where throughput can be improved. HPC
threads tend in the direction
of twins.
>>It really isn't at all clear that HThreading needs to live (as my kids
>>would put it:-). AMD just uses shorter instruction pipelines (less to
>>flush and fill) and seems to outperform Hthreaded Intel at constant
>>clock, at any rate. At MOST it yields a 30% or so speedup that is
>>relevant to somebody doing work with lots of independent things going on
>>on a single CPU -- typically (unsurprisingly) a desktop user watching a
>>movie or the like -- decoding video and audio at the same time, or
>>servers handling multiple service threads. In others, it yields a
>>DECREASE of say 10% due to aforementioned cache-thrashing.</editorial
>>comment>
>>
>>
>
>HT is a very limited implementation of a general class of multi-threaded
>processors. IBM and others have done better SMT. and the appeal is clear:
>most programs do not manage to keep all a CPU's functional units busy,
>so why not share the pool of FU's among multiple threads? I'm expecting
>someone to eventually replicate the proc-specific parts (registers and
>L1 cache), but share all the FU's in a package. sort of merged multi-core.
>on the other hand, the chip area devoted to actual compute elements (ALU/FPU)
>is dwarfed by caches...
>
>
Hyperthreading capitalizes on the fact that a CPU (more-or-less)
does not care from what thread
a given intructions it is executing to push (in some contexts) the
actual observed ILP in the direction
of a processors theoretical maximum ILP [instruction level
parallelism].
regards,
rbw
>regards, mark hahn.
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
>
>
More information about the Beowulf
mailing list