[Beowulf] Intel Quad-Core or AMD Opteron
Miguel Dias Costa
mcosta at fc.up.pt
Thu Aug 23 10:09:09 PDT 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello all.
I understand that, when comparing Quad-Core Xeons with Opterons,
people focus on the scability issues of the different multi core
architectures, but we've run some benchmarks on both and the thing
that at the time surprised me the most was that if your application
makes much use of the functions provided by Intel Math Kernel Library,
a single Xeon core (e.g Clovertown) can be up to twice as fast as a
single Opteron core.
I suppose things might change with "Barcelona" cores, but right now,
this might be relevant for the choice between the two, depending on
you application.
Miguel
Robert G. Brown wrote:
> On Thu, 23 Aug 2007, Li, Bo wrote:
>
>> Doug have got many valuable points I think.
>> The current application using MPI or OPENMP on a multi-core machine
>> runs in simple SMP way, which means nearly nothing done for
>> multi-core optimization. IMHO, at least multi-core processors
>> equips with different internal inter-connection from general SMP
>> system. We can put much data exchanging operations done on one
>> socket and the external bandwidth can be used for any other threads.
>> For SMP systems or multi-core systems, main memory bandwidth can be
>> the critical bottle-neck. In some of my experiments, 8 cores can
>> eat all of them and in extreme cases, 4 cores eat up and you can
>> hardly find any improvement from 4 cores to 8 cores. In these
>> conditions, data transferring should be planned well and done in an
>> efficient.
>> By the way, I prefer to use OPENMP on a SMP system and MPI between
>> boxes in a cluster. Multi-core processors saved me much money for
>> the same peak performance, but tuning or optimization can help us
>> do better.
>> Regards,
>> Li, Bo
>
> More to the point (and perhaps more useful for cluster/systems
> engineers) -- as one adds cores to a CPU without increasing the
> bandwidth of the memory it shares or otherwise multiplexing the
> pathways
> to that memory, it is almost inevitable that a memory bottleneck will
> appear. It seems reasonable to probe that memory bottleneck as
> directly
> as possible with e.g. multiple copies of stream or stream-like
> benchmarks that also permit shuffled/nonstreaming/random access to
> memory blocks for varying sizes and strides of read and written data to
> get an idea of the BASELINE rates and nonlinearities as one runs over
> the cache boundaries and so on.
>
> This gives people without access to a quad core at least the
> opportunity
> to meditate on whether or not there is any hope of it being useful, to
> them, compared to (say) a dual dual core or two single processor dual
> cores with a network connection. A quad core CPU "is" a little
> mini-beowulf in some sense and faces very similar issues when one
> contemplates its task scaling, and just as network-based IPCs are the
> crux of the issue for COTS cluster design it seems that core-to-core
> and/or core-to-memory latencies and bandwidths in different load
> configurations are at the crux for multicores, where putting LOTS of
> cores on a single die can just make things MUCH worse. (Remembering
> that the task scaling curve can easily turn DOWN if you put too many
> CPUs on a task with inadequate IPC capacity.)
>
> Has anyone published a simple set of these numbers? One would
> expect to
> be able to understand most other benchmarks in terms of them...
>
> rgb
>
>> ----- Original Message -----
>> From: "Douglas Eadline" <deadline at eadline.org>
>> To: "Ruhollah Moussavi Baygi" <ruhollah.mb at gmail.com>
>> Cc: <beowulf at beowulf.org>
>> Sent: Thursday, August 23, 2007 9:09 PM
>> Subject: Re: [Beowulf] Intel Quad-Core or AMD Opteron
>>
>>
>>> Multi-core, I lie awake at night thinking about this stuff.
>>> There seem to be no quick answers.
>>>
>>> The thing that amazes me about multi-core is how many people
>>> consider the performance of a single process to be a good measure
>>> of total processor performance. If you are going to by a quad-core
>>> CPU to run one process at a time, then this is good test
>>> otherwise it is like predicting performance of your code
>>> on cluster by running it on the head node as single
>>> serial job.
>>>
>>> Over the past 8-10 months I have had the chance to test
>>> Intel quad-core, AMD dual core (soon I'll have some Barcelona's)
>>> and here are my conclusions. The details of what I found are
>>> in my columns in Linux Magazine, which is slowly making its way
>>> to the LM web site (and eventually ClusterMonkey):
>>>
>>> - how well multiple processes run (use memory) on quad-core
>>> is very application specific. I have a simple test script
>>> that calculates what I call "effective cores". I have seen
>>> these results range from about 2-7 on a dual socket quad-core
>>> Intel system (8 cores total) and a quad socket dual core AMD
>>> system (8 cores total).
>>>
>>> - running a single copy of the NAS FT benchmark on a clovertown
>>> was much faster than a comparable Opteron. But, running a parallel
>>> MPI version of FT on 8 cores showed the AMD system to be faster.
>>>
>>> - on Intel quad-cores where the process is placed can have
>>> large effect on performance. This is largely due to the
>>> fact that you have four dual core woodcrests each with it's
>>> own cache. Naturally, if you have four processes running
>>> it is best if each one gets its own woodcrest. To the OS
>>> the all look the same. Other than Intel MPI, I don't
>>> know of any other MPI that attempts to optimize this.
>>> Open MPI has some processor affinity but it is
>>> not all that sophisticated (yet).
>>>
>>> - again depending on the application, GigE may not
>>> sufficient to support the amount of traffic that
>>> multi-core can generate. So if your code ran
>>> well on GigE, it may not on a multi-core cluster.
>>> Things like IB or Myrinet 10GigE may be needed.
>>>
>>> Please note, I am not trying to pick a winner, were that
>>> even possible. I want to state that more than ever testing
>>> your code(s) in parallel on these systems is critical if
>>> you want to get optimal performance.
>>>
>>> One other thing I found as well. I recently ran the NAS
>>> parallel benchmarks on a dual socket quad core Intel system
>>> (8 cores total) using both the OpenMP (GNU 4.2) and MPI (LAM)
>>> libraries. Anyone want to guess what produced the best results?
>>>
>>> --
>>> Doug
>>>
>>>
>>>> Hi everybody,
>>>>
>>>> As you may be aware of, Intel has reduced the price of its Quad
>>>> CPUs,
>>>> dramatically.
>>>>
>>>> Does anyone have any experience using Intel Quad-Core CPUs in a
>>>> Beowulf
>>>> Cluster?
>>>>
>>>> Do you prefer these ones over AMD Opteron?
>>>>
>>>> Essentially, are Intel Quad CPUs having really FOUR cores? Are
>>>> they really
>>>> 64-bit processors, as Opterons are?
>>>>
>>>> Thanks for any comment on each of my questions.
>>>>
>>>> Wishes,
>>>> rmb
>>>>
>>>>
>>>> --
>>>> Best,
>>>> Ruhollah Moussavi Baygi
>>>>
>>>>
>>>> !DSPAM:46cc246327668298414181!
>>>> _______________________________________________
>>>> Beowulf mailing list, Beowulf at beowulf.org
>>>> To change your subscription (digest mode or unsubscribe) visit
>>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>>
>>>>
>>>> !DSPAM:46cc246327668298414181!
>>>>
>>>
>>>
>>> --
>>> Doug
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org
>>> To change your subscription (digest mode or unsubscribe) visit
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org
iD8DBQFGzb80VZVLYHOR6x4RAmAEAJ4jOV4k8Opm74mXjheS8wyJHhndWQCfbqOb
5QGgeuZArjpOqRw+PKF7oHU=
=+akE
-----END PGP SIGNATURE-----
More information about the Beowulf
mailing list