[Beowulf] dual-core benefits?

Thu Sep 22 10:53:29 PDT 2005

Tahir wrote that they need 4G per node now and 16G later.

If you have dual single core, you have 8G per core, but if
you have dual dual core, you only have 4G per core.

On the other hand the scalability issues Tahir mentioned below
sounds like that interprocess communication is the bottleneck
and then you want as many cores as you can get in the system.

You could then architect your solution with quad-dual-core opteron
systems which gives you 8 cores and 32G of RAM in one node.

Michael

Joe Landman wrote:

> Hi Tahir:
>
> Tahir Malas wrote:
>
>> Hi everyone,
>>
>> I would like to take advice for the processor selection for the 
>> cluster that
>> we will configure soon. Comparing the sequential performance of our 
>> programs
>> on an Opteron 246 and a much more expensive machine with Itanium 
>> Processor,
>> we have decided to use opteron processors with Tyan mbs. However, we 
>> are in
>> a confusion to decide on the processor selection. Before posing my
>> questions, I'd better give some info about our application requirements:
>>
>> 1. The scalability of our program is not so good, less then 20 for 32 
>> nodes
>> (measured on a single node system). So we don't plan to go beyond 16 
>> nodes.
>> (which makes 32 processors due to dual-node usage)
>
>
> Do you mean a single cpu per node, or single core per CPU, or a large 
> SMP?  You might also wish to look at the iWill motherboards.
>
>> 2. Memory requirement is huge; we will use 4GB memory per node for 
>> the time
>> being and increase this to 16 GB later. So wee need fast CPUs and 
>> efficient
>> usage of memory.
>
>
> Ok, you are going to want the later model MB's that properly support 
> DDR/400 to single rank dimms.  2GB dimms are still not cheap (at least 
> the good ones).
>
>> 3. Due to budget limitations we will first configure 8-node system 
>> with 4GB
>> RAM per node and extend this to a 16-node system with 16-GB of RAM in 6
>> months.
>>
>> We were thinking of AMD 250 processors, but now the benchmarks of 
>> dual-core
>> CPUs (on the web site of AMD) seems encouraging, and the cost of 
>> dual-core
>> AMD 275 seems to be less then twice of AMD 250. 
>
>
> http://enterprise2.amd.com/downloadables/Dual_Core_Performance.pdf and 
> other I presume. :)
>
>> Since the memory cost of our
>> system will dominate other costs, we can afford to pass to dual-core
>> technology. However, the questions that arise are follows.
>>
>> 1. Will it worth? And can we gain any advantages over single-core 
>> with the
>> not-so-good scalability of our parallel programs? 
>
>
> It depends upon the code.  If your code requires very low latency, the 
> benefit of dual core nodes are that you have 4 interconnected cores 
> (think of them as individual processors) connected over a very high 
> speed low latency interface.  If this is well coupled to the rest of 
> the system through an external low latency interface (Infinipath, IB, 
> Myrinet, etc), and your code is latency sensitive, then dual core 
> could be a substantial win for you.  If your code simply hammers on 
> memory bandwidth, then it is possible in some cases for it to be a 
> liability relative to single core.  Some cases (weather codes) 
> demonstrated something like this here in the recent past.
>
>> 2. Another question is that is dual-core technology brings any 
>> advantages
>> for the efficient usage of high amount of memory that we will 
>> utilize? 3. 3.
>
>
> Not really advantage or disadvantage.  With single core, your 
> aggregate memory bandwidth is N(cores) * Bandwidth of one of the 
> memory busses. With dual core, it is (N(cores)/2) * Bandwidth of one 
> of the memory busses.  This may or may not be an issue for your code.
>
>> 3. Finally there is something basic that I'm not sure: When we assign 
>> a job
>> to dual-core CPU, can it divide it between the core-CPUs 
>> automatically, or
>> should we think dual-core CPU the same as dual-node CPU? If the 
>> latter is
>> the case, what is the advantage of this technology over dual-node?
>
>
> Think of this as 2 physicallly independent CPUs (that just happen to 
> share the same space on the motherboard).  That means your dual CPU 
> nodes become 4-ways.  In terms of assinging a job to a CPU (or core), 
> you still need a threading library or an MPI library and appropriate 
> changes to the source code to make it scale.  But the advantage for 
> you would be less overall latency between CPUs for messaging using 
> MPI, and large SMP nodes for OpenMP.  The potential disadvantage is 
> loss of effective memory bandwidth.  If you look at the above URL for 
> the paper, you will see that the bandwidth issue wasn't a factor for 
> the tests we ran.  It could be for your code, and that the important 
> part.  You need to test to be sure.
>
>>
>> If anyone has info and/or experiences about these, I will be very 
>> glad to
>> know.
>> Thanks in advance,
>> Tahir Malas
>> Bilkent University Electrical and Electronics Engineering Department
>> Phone: +90 312 290 1385
>>
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org
>> To change your subscription (digest mode or unsubscribe) visit 
>> http://www.beowulf.org/mailman/listinfo/beowulf
>
>

-- 
Michael Will
Penguin Computing Corp.
Sales Engineer
415-954-2822
415-954-2899 fx
mwill at penguincomputing.com