[Beowulf] dual-core benefits?

Richard Walsh rbw at ahpcrc.org
Thu Sep 22 10:50:03 PDT 2005

Joe Landman wrote:

> Hi Tahir:
> Tahir Malas wrote:
>> Since the memory cost of our
>> system will dominate other costs, we can afford to pass to dual-core
>> technology. However, the questions that arise are follows.
>> 1. Will it worth? And can we gain any advantages over single-core 
>> with the
>> not-so-good scalability of our parallel programs? 
> It depends upon the code.  If your code requires very low latency, the 
> benefit of dual core nodes are that you have 4 interconnected cores 
> (think of them as individual processors) connected over a very high 
> speed low latency interface.  If this is well coupled to the rest of 
> the system through an external low latency interface (Infinipath, IB, 
> Myrinet, etc), and your code is latency sensitive, then dual core 
> could be a substantial win for you.  If your code simply hammers on 
> memory bandwidth, then it is possible in some cases for it to be a 
> liability relative to single core.  Some cases (weather codes) 
> demonstrated something like this here in the recent past.

     It is probably worth pointing out here that the latency being 
referred to is network
     latency.  Latency to memory is >>worse<< when cache coherency is 
turned on in single or
     dual core SMP configurations (something like ~100 nanos vs ~55 
nanos).  I am assuming
     that a single-chip dual-core will have to have a cache coherent 
memory reference protocol
     and be slower.  While it is true that scalability limited by 
network latency may improve,
     a bandwidth intensive application may suffer (amenability to 
prefetching affects how much)
     in a cache-coherenct context because of the overhead added by 
larger memory reference

>> 2. Another question is that is dual-core technology brings any 
>> advantages
>> for the efficient usage of high amount of memory that we will 
>> utilize? 3. 3.
> Not really advantage or disadvantage.  With single core, your 
> aggregate memory bandwidth is N(cores) * Bandwidth of one of the 
> memory busses. With dual core, it is (N(cores)/2) * Bandwidth of one 
> of the memory busses.  This may or may not be an issue for your code.

     Or to perhaps put it more simply, it is limited by the number of 
on-chip memory controllers on
     the board (not to mention their clock and the speed/type of your 


More information about the Beowulf mailing list