[Beowulf] dual-core benefits?
Joel Jaeggli
joelja at darkwing.uoregon.edu
Thu Sep 22 11:25:35 PDT 2005
On Thu, 22 Sep 2005, Michael Will wrote:
> Tahir wrote that they need 4G per node now and 16G later.
>
> If you have dual single core, you have 8G per core, but if
> you have dual dual core, you only have 4G per core.
>
> On the other hand the scalability issues Tahir mentioned below
> sounds like that interprocess communication is the bottleneck
> and then you want as many cores as you can get in the system.
>
> You could then architect your solution with quad-dual-core opteron
> systems which gives you 8 cores and 32G of RAM in one node.
I suspect that once it's cost effective for you make the jump to 8xx cpu's
and a non-commidity 4-way mainboard that going all the way to the 8way 16
core box is also cost effective.
the iwill h8501 chassis is around $10500 and the opteron 865 are about
1,300 ea which is a premium of about $500 over the 265. so probably $30k
for 16core box that fits in 5u.
> Michael
>
>
>
>
> Joe Landman wrote:
>
>> Hi Tahir:
>>
>> Tahir Malas wrote:
>>
>>> Hi everyone,
>>>
>>> I would like to take advice for the processor selection for the cluster
>>> that
>>> we will configure soon. Comparing the sequential performance of our
>>> programs
>>> on an Opteron 246 and a much more expensive machine with Itanium
>>> Processor,
>>> we have decided to use opteron processors with Tyan mbs. However, we are
>>> in
>>> a confusion to decide on the processor selection. Before posing my
>>> questions, I'd better give some info about our application requirements:
>>>
>>> 1. The scalability of our program is not so good, less then 20 for 32
>>> nodes
>>> (measured on a single node system). So we don't plan to go beyond 16
>>> nodes.
>>> (which makes 32 processors due to dual-node usage)
>>
>>
>> Do you mean a single cpu per node, or single core per CPU, or a large SMP?
>> You might also wish to look at the iWill motherboards.
>>
>>> 2. Memory requirement is huge; we will use 4GB memory per node for the
>>> time
>>> being and increase this to 16 GB later. So wee need fast CPUs and
>>> efficient
>>> usage of memory.
>>
>>
>> Ok, you are going to want the later model MB's that properly support
>> DDR/400 to single rank dimms. 2GB dimms are still not cheap (at least the
>> good ones).
>>
>>> 3. Due to budget limitations we will first configure 8-node system with
>>> 4GB
>>> RAM per node and extend this to a 16-node system with 16-GB of RAM in 6
>>> months.
>>>
>>> We were thinking of AMD 250 processors, but now the benchmarks of
>>> dual-core
>>> CPUs (on the web site of AMD) seems encouraging, and the cost of dual-core
>>> AMD 275 seems to be less then twice of AMD 250.
>>
>>
>> http://enterprise2.amd.com/downloadables/Dual_Core_Performance.pdf and
>> other I presume. :)
>>
>>> Since the memory cost of our
>>> system will dominate other costs, we can afford to pass to dual-core
>>> technology. However, the questions that arise are follows.
>>>
>>> 1. Will it worth? And can we gain any advantages over single-core with the
>>> not-so-good scalability of our parallel programs?
>>
>>
>> It depends upon the code. If your code requires very low latency, the
>> benefit of dual core nodes are that you have 4 interconnected cores (think
>> of them as individual processors) connected over a very high speed low
>> latency interface. If this is well coupled to the rest of the system
>> through an external low latency interface (Infinipath, IB, Myrinet, etc),
>> and your code is latency sensitive, then dual core could be a substantial
>> win for you. If your code simply hammers on memory bandwidth, then it is
>> possible in some cases for it to be a liability relative to single core.
>> Some cases (weather codes) demonstrated something like this here in the
>> recent past.
>>
>>> 2. Another question is that is dual-core technology brings any advantages
>>> for the efficient usage of high amount of memory that we will utilize? 3.
>>> 3.
>>
>>
>> Not really advantage or disadvantage. With single core, your aggregate
>> memory bandwidth is N(cores) * Bandwidth of one of the memory busses. With
>> dual core, it is (N(cores)/2) * Bandwidth of one of the memory busses.
>> This may or may not be an issue for your code.
>>
>>> 3. Finally there is something basic that I'm not sure: When we assign a
>>> job
>>> to dual-core CPU, can it divide it between the core-CPUs automatically, or
>>> should we think dual-core CPU the same as dual-node CPU? If the latter is
>>> the case, what is the advantage of this technology over dual-node?
>>
>>
>> Think of this as 2 physicallly independent CPUs (that just happen to share
>> the same space on the motherboard). That means your dual CPU nodes become
>> 4-ways. In terms of assinging a job to a CPU (or core), you still need a
>> threading library or an MPI library and appropriate changes to the source
>> code to make it scale. But the advantage for you would be less overall
>> latency between CPUs for messaging using MPI, and large SMP nodes for
>> OpenMP. The potential disadvantage is loss of effective memory bandwidth.
>> If you look at the above URL for the paper, you will see that the bandwidth
>> issue wasn't a factor for the tests we ran. It could be for your code, and
>> that the important part. You need to test to be sure.
>>
>>>
>>> If anyone has info and/or experiences about these, I will be very glad to
>>> know.
>>> Thanks in advance,
>>> Tahir Malas
>>> Bilkent University Electrical and Electronics Engineering Department
>>> Phone: +90 312 290 1385
>>>
>>>
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org
>>> To change your subscription (digest mode or unsubscribe) visit
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>>
>
>
>
--
--------------------------------------------------------------------------
Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu
GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2
More information about the Beowulf
mailing list