[Beowulf] Benchmark between Dell Poweredge 1950 And 1435

Thu Mar 8 09:14:43 PST 2007

Joshua,
Great thanks. That was clear and the takeaway is that I should pay attention
to the number of memory channels per core (which may be less than 1.0)
besides the number of cores and the RAM/core.

What is the "ncpu" column in Table 1 (for example)? Does the 4 refer to 4
cores, and the 1 and 2 cases don't use all the cores on the motherboard? Or
is "ncpu" an application parameter? I read it as "number of CPUs"? I noted
that the heart simulation didn't have an ncpu column, but that was why I
thought you had multiple nodes going.

Thanks very much,
Peter

P.S. and then where does the billiard cue go?

On 3/8/07, Joshua Baker-LePain <jlb17 at duke.edu> wrote:
>
> On Thu, 8 Mar 2007 at 11:33am, Peter St. John wrote
>
> > Those benchmarks are quite interesting and I wonder if I interpret them
> at
> > all correctly.
> > It would seem that the Intel outperforms it's advantage in clockspeed
> (1/6th
> > faster, but ballpark 1/3 better performance?) so the question would be
> > performance gain per dollar cost (which is fine); however, for that
> heart
> > simulation towards the end, it looks like the AMD scales up with
> increasing
> > nodecount enormously better, and with several nodes actually outperforms
> the
> > faster Intel.
> > Should I guess at relatively poor performance of the networking on the
> > motherboard used with the intel chip or does that have anything to do
> with
> > the CPU itself?
>
> Each benchmark was run on a single sytem with 4 CPUs (or, rather, 4 cores
> in 2 sockets) -- there was no network involved.  The difference (IMO) lies
> in the memory subsystems of the 2 architectures.
>
> Opterons have 1 memory controller per socket (on the CPU, shared by the 2
> cores) attached to a dedicated bank of memory via a Hypertransport link
> (referred to from here on as HT).  That socket is connected to the other
> CPU socket (and its HT connected memory bank) by HT.
>
> Xeons (still) have a single memory controller hub to which the CPUs
> communicate via the front side bus (FSB).  That single hub has 2 channels
> to memory.
>
> So, yes, clock-for-clock (and for my usage) Xeon 51xxs are faster than
> Opterons.  But, if your code hits memory *really hard* (which that heart
> model does), then the multiple paths to memory available to the Opterons
> allow them to scale better.
>
> This situation has existed for a long time on the Intel side.  For P4
> based Xeons it was crippling.  The new Core based Xeons, however, don't
> suffer nearly as badly (due to their big cache, maybe?).  E.g. the thermal
> simulations in that same file are pretty memory intensive themselves, and
> P4 based Xeons scaled *horribly* on them.  But the 51xx Xeons still scale
> very well on them (which surprised me).
>
> --
> Joshua Baker-LePain
> Department of Biomedical Engineering
> Duke University
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20070308/d84706e0/attachment.html>