[Beowulf] dual-core benefits?

Thu Sep 22 11:25:35 PDT 2005

On Thu, 22 Sep 2005, Michael Will wrote:

> Tahir wrote that they need 4G per node now and 16G later.
>
> If you have dual single core, you have 8G per core, but if
> you have dual dual core, you only have 4G per core.
>
> On the other hand the scalability issues Tahir mentioned below
> sounds like that interprocess communication is the bottleneck
> and then you want as many cores as you can get in the system.
>
> You could then architect your solution with quad-dual-core opteron
> systems which gives you 8 cores and 32G of RAM in one node.

I suspect that once it's cost effective for you make the jump to 8xx cpu's 
and a non-commidity 4-way mainboard that going all the way to the 8way 16 
core box is also cost effective.

the iwill h8501 chassis is around $10500 and the opteron 865 are about 
1,300 ea which is a premium of about $500 over the 265. so probably $30k 
for 16core box that fits in 5u.

> Michael
>
>
>
>
> Joe Landman wrote:
>
>> Hi Tahir:
>> 
>> Tahir Malas wrote:
>> 
>>> Hi everyone,
>>> 
>>> I would like to take advice for the processor selection for the cluster 
>>> that
>>> we will configure soon. Comparing the sequential performance of our 
>>> programs
>>> on an Opteron 246 and a much more expensive machine with Itanium 
>>> Processor,
>>> we have decided to use opteron processors with Tyan mbs. However, we are 
>>> in
>>> a confusion to decide on the processor selection. Before posing my
>>> questions, I'd better give some info about our application requirements:
>>> 
>>> 1. The scalability of our program is not so good, less then 20 for 32 
>>> nodes
>>> (measured on a single node system). So we don't plan to go beyond 16 
>>> nodes.
>>> (which makes 32 processors due to dual-node usage)
>> 
>> 
>> Do you mean a single cpu per node, or single core per CPU, or a large SMP? 
>> You might also wish to look at the iWill motherboards.
>> 
>>> 2. Memory requirement is huge; we will use 4GB memory per node for the 
>>> time
>>> being and increase this to 16 GB later. So wee need fast CPUs and 
>>> efficient
>>> usage of memory.
>> 
>> 
>> Ok, you are going to want the later model MB's that properly support 
>> DDR/400 to single rank dimms.  2GB dimms are still not cheap (at least the 
>> good ones).
>> 
>>> 3. Due to budget limitations we will first configure 8-node system with 
>>> 4GB
>>> RAM per node and extend this to a 16-node system with 16-GB of RAM in 6
>>> months.
>>> 
>>> We were thinking of AMD 250 processors, but now the benchmarks of 
>>> dual-core
>>> CPUs (on the web site of AMD) seems encouraging, and the cost of dual-core
>>> AMD 275 seems to be less then twice of AMD 250. 
>> 
>> 
>> http://enterprise2.amd.com/downloadables/Dual_Core_Performance.pdf and 
>> other I presume. :)
>> 
>>> Since the memory cost of our
>>> system will dominate other costs, we can afford to pass to dual-core
>>> technology. However, the questions that arise are follows.
>>> 
>>> 1. Will it worth? And can we gain any advantages over single-core with the
>>> not-so-good scalability of our parallel programs? 
>> 
>> 
>> It depends upon the code.  If your code requires very low latency, the 
>> benefit of dual core nodes are that you have 4 interconnected cores (think 
>> of them as individual processors) connected over a very high speed low 
>> latency interface.  If this is well coupled to the rest of the system 
>> through an external low latency interface (Infinipath, IB, Myrinet, etc), 
>> and your code is latency sensitive, then dual core could be a substantial 
>> win for you.  If your code simply hammers on memory bandwidth, then it is 
>> possible in some cases for it to be a liability relative to single core. 
>> Some cases (weather codes) demonstrated something like this here in the 
>> recent past.
>> 
>>> 2. Another question is that is dual-core technology brings any advantages
>>> for the efficient usage of high amount of memory that we will utilize? 3. 
>>> 3.
>> 
>> 
>> Not really advantage or disadvantage.  With single core, your aggregate 
>> memory bandwidth is N(cores) * Bandwidth of one of the memory busses. With 
>> dual core, it is (N(cores)/2) * Bandwidth of one of the memory busses. 
>> This may or may not be an issue for your code.
>> 
>>> 3. Finally there is something basic that I'm not sure: When we assign a 
>>> job
>>> to dual-core CPU, can it divide it between the core-CPUs automatically, or
>>> should we think dual-core CPU the same as dual-node CPU? If the latter is
>>> the case, what is the advantage of this technology over dual-node?
>> 
>> 
>> Think of this as 2 physicallly independent CPUs (that just happen to share 
>> the same space on the motherboard).  That means your dual CPU nodes become 
>> 4-ways.  In terms of assinging a job to a CPU (or core), you still need a 
>> threading library or an MPI library and appropriate changes to the source 
>> code to make it scale.  But the advantage for you would be less overall 
>> latency between CPUs for messaging using MPI, and large SMP nodes for 
>> OpenMP.  The potential disadvantage is loss of effective memory bandwidth. 
>> If you look at the above URL for the paper, you will see that the bandwidth 
>> issue wasn't a factor for the tests we ran.  It could be for your code, and 
>> that the important part.  You need to test to be sure.
>> 
>>> 
>>> If anyone has info and/or experiences about these, I will be very glad to
>>> know.
>>> Thanks in advance,
>>> Tahir Malas
>>> Bilkent University Electrical and Electronics Engineering Department
>>> Phone: +90 312 290 1385
>>> 
>>> 
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org
>>> To change your subscription (digest mode or unsubscribe) visit 
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>> 
>> 
>
>
>

-- 
--------------------------------------------------------------------------
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2